A few months ago, we received two trace files representing two executions of the same program. One execution took almost 14 times longer than the other. The cause of the difference was dominantly the number of “db file sequential read” calls made by the two executions. Why would two executions of the same program, processing approximately the same amount of data, present radically different “db file sequential read” call counts? One guess is that it’s pressure on the database buffer cache that’s causing the problem, but how can you prove it?
Well, with Oracle trace data and the Method R Workbench application, the proof took only a few minutes. The first thing we did was run a prepackaged report called “Duration by database block address (DBA) for /db.*read/ calls”. Here’s the report for the slow execution:
FILE:BLOCK DURATION % CALLS MEAN MIN MAX ----------------- ------------ ------ ---------- -------- -------- -------- 122:3635207 0.176217 0.0% 127 0.001388 0.000205 0.108815 138:3375101 0.155750 0.0% 199 0.000783 0.000467 0.010330 132:3396113 0.154863 0.0% 201 0.000770 0.000622 0.004885 152:2355529 0.150312 0.0% 208 0.000723 0.000510 0.009621 125:1857501 0.148987 0.0% 189 0.000788 0.000620 0.009322 117:1428834 0.147616 0.0% 200 0.000738 0.000214 0.014285 156:2205769 0.147473 0.0% 118 0.001250 0.000439 0.077006 142:1557233 0.145838 0.0% 200 0.000729 0.000566 0.007101 137:2474845 0.145480 0.0% 199 0.000731 0.000590 0.001416 159:10395 0.145115 0.0% 159 0.000913 0.000211 0.049969 1,062,476 others 7,860.734617 100.0% 13,997,468 0.000562 0.000147 0.111807 ----------------- ------------ ------ ---------- -------- -------- -------- TOTAL (1,062,486) 7,862.252268 100.0% 13,999,268 0.000562 0.000147 0.111807
Do you see the problem? Look at the CALLS column This program execution read block 122:3635207 into the database buffer cache, using system calls, 127 distinct times. It spent 0.176217 seconds doing those 127 reads, consuming between 0.000205 and 0.108815 seconds per call.
It read block lots of blocks about 200 times apiece. This is the magic of tracing: now you know, unequivocally, what happened.
The next question is, why?
It’s because of pressure on the database buffer cache. By the time this program needed to access 122:3635207 a second time (after having read it once already), that block had been aged out of the database buffer cache by some other program. So the program being traced had to read it again. The trace file shows this: every few hundred thousand lines, you’ll see another “db file sequential read” call for block 122:3635207. There are 127 such lines for this block in the trace file. And there are hundreds of other such lines for thousands of other such blocks!
Here’s the same report for the faster execution:
FILE:BLOCK DURATION % CALLS MEAN MIN MAX --------------- ---------- ------ ------- -------- -------- -------- 53:1273982 0.110119 0.0% 1 0.110119 0.110119 0.110119 84:1248015 0.099944 0.0% 1 0.099944 0.099944 0.099944 102:2950178 0.095721 0.0% 1 0.095721 0.095721 0.095721 59:1960583 0.094851 0.0% 1 0.094851 0.094851 0.094851 98:190433 0.090348 0.0% 1 0.090348 0.090348 0.090348 92:1611224 0.082210 0.0% 1 0.082210 0.082210 0.082210 45:1377638 0.065793 0.0% 1 0.065793 0.065793 0.065793 142:2384429 0.059034 0.0% 1 0.059034 0.059034 0.059034 54:261303 0.054912 0.0% 1 0.054912 0.054912 0.054912 78:52474 0.053043 0.0% 1 0.053043 0.053043 0.053043 446,838 others 263.903507 99.7% 458,229 0.000576 0.000157 0.050658 --------------- ---------- ------ ------- -------- -------- -------- TOTAL (446,848) 264.709482 100.0% 458,239 0.000578 0.000157 0.110119
This execution read each of its blocks once and only once. (There are a few non-1 values hiding in the “others” row, but not enough to worry about. Can you see how I know that?)
So, how much time did the slow execution waste on block re-reads? With the Method R Workbench’s mrskew utility, we can calculate that. It takes a little bit of fancy command line work, so we put into a .rc file, where we can save the results. Here’s what that looks like; it’s a file called “first-read-only.rc”, which we shipped in Workbench version 9.2:
# Report only the duration of a given data block's first read into the buffer # cache; that is, discard the durations of second and subsequent reads. # --init=' # Hash of data block addresses (DBAs). my %h = (); # Return duration only if this is the first read call for the block. sub f($$) { my ($f, $b) = @_; # file id, block id. if ($h{"$f:$b"}) { # If we've seen this DBA before, then return a zero duration. return 0; } else { # Otherwise, record its existence and return its duration. return $h{"$f:$b"} = $af; } } ' --name='db file sequential read' --alldepths --group='"$p1:$p2"' --group-label="FILE:BLOCK" --select='f($p1,$p2)' --select-label="DURATION-1ST-READ-ONLY"
It looks kind of messy, but it took me only about ten minutes to write, because it’s very similar to another .rc file called “distinct-texts.rc” that we already ship with the product.
Running mrskew with this .rc file produces the following result (I’ve trimmed off some irrelevant columns):
FILE:BLOCK DURATION-1ST-READ-ONLY ----------------- ---------------------- 77:527660 0.072600 47:624193 0.058515 45:2990164 0.051676 74:3024653 0.043204 44:1372296 0.024697 67:545061 0.019459 77:1596047 0.019336 113:1888944 0.019321 119:468208 0.018722 64:1595078 0.018314 1,062,476 others 577.876788 ----------------- ---------------------- TOTAL (1,062,486) 578.222632
What does this report mean? It means that if this execution had read each block only once, then “db file sequential read” calls would have consumed only 578 seconds instead of 7,862.
The big deal about the whole Method R experience is that we can tell precisely how your programs spend their time. We can also tell you how much of your time is being wasted. That’s important, because knowing is way better than guessing.