Welcome, Guest
Username: Password: Remember me


MRSkew - 4 years 6 months ago #337

Kyle Hailey posted a blog about how to determine if physical I/O was a disk hit or file system cache. He says a trace file cannot you this so he wrote a Perl script to produce a histogram of I/O latency.

Is this something MRSkew can do? I looked at the man page but I do not see anything.

The administrator has disabled public write access.

Re:MRSkew - 4 years 6 months ago #338

  • Jeff Holt
  • Jeff Holt's Avatar
  • Posts: 112
  • Thank you received: 4
Hi, Jimmy.

I think what you're saying is that tkprof doesn't produce histograms for read latencies. In fact, it doesn't produce any histograms. Many profiling products don't show histograms but ours does.

What Kyle is saying is that you cannot tell from the trace data if a read system call was satisfied by the filesystem cache, the I/O subsystem's cache, or one or more I/O subsystem disks.

What Kyle says is true. In fact, unless you're a wiz at debugging/tracing device drivers there's no way to know with 100% certainty from where a read call is satisfied. Not even dtrace, strace, ltrace, or .*trace can do such a thing.

What Kyle did with his perl script is produce a histogram, using the trace data, and assume reads satisfied in less than 10 microseconds were satisfied by the filesystem cache.

Again, Kyle's assumption is correct. In fact, I would go further and say, generally, any read call that is satisfied in less time that it takes a single round trip to/from the I/O subsystem must be satisfied by the filesystem cache. Typically, that's going to be anything less than 100 microseconds for un-congested, low latency networks.

Yes, mrskew can produce a histogram for any predicate you give it. In fact, there is an rc file that does just that. Look in the rc directory and you'll see a disk rc file that specifies mrskew command line arguments that might be useful for traditional rotational disks. There is also an ssd rc file that does the same thing but uses buckets with much smaller latencies.

I actually, like a version that merges both so I don't have to think about it.

Last Edit: 4 years 6 months ago by Jeff Holt. Reason: change a minor semantic problem
The administrator has disabled public write access.