Categories
Blog Posts

Quantifying Buffer Cache Pressure

A few months ago, we received two trace files representing two executions of the same program. One execution took almost 14 times longer than the other. The cause of the difference was dominantly the number of “db file sequential read” calls made by the two executions. Why would two executions of the same program, processing approximately the same amount of data, present radically different “db file sequential read” call counts? One guess is that it’s pressure on the database buffer cache that’s causing the problem, but how can you prove it?

Well, with Oracle trace data and the Method R Workbench application, the proof took only a few minutes. The first thing we did was run a prepackaged report called “Duration by database block address (DBA) for /db.*read/ calls”. Here’s the report for the slow execution:

FILE:BLOCK             DURATION       %       CALLS      MEAN       MIN       MAX
-----------------  ------------  ------  ----------  --------  --------  --------
122:3635207            0.176217    0.0%         127  0.001388  0.000205  0.108815
138:3375101            0.155750    0.0%         199  0.000783  0.000467  0.010330
132:3396113            0.154863    0.0%         201  0.000770  0.000622  0.004885
152:2355529            0.150312    0.0%         208  0.000723  0.000510  0.009621
125:1857501            0.148987    0.0%         189  0.000788  0.000620  0.009322
117:1428834            0.147616    0.0%         200  0.000738  0.000214  0.014285
156:2205769            0.147473    0.0%         118  0.001250  0.000439  0.077006
142:1557233            0.145838    0.0%         200  0.000729  0.000566  0.007101
137:2474845            0.145480    0.0%         199  0.000731  0.000590  0.001416
159:10395              0.145115    0.0%         159  0.000913  0.000211  0.049969
1,062,476 others   7,860.734617  100.0%  13,997,468  0.000562  0.000147  0.111807
-----------------  ------------  ------  ----------  --------  --------  --------
TOTAL (1,062,486)  7,862.252268  100.0%  13,999,268  0.000562  0.000147  0.111807

Do you see the problem? Look at the CALLS column This program execution read block 122:3635207 into the database buffer cache, using system calls, 127 distinct times. It spent 0.176217 seconds doing those 127 reads, consuming between 0.000205 and 0.108815 seconds per call.

It read block lots of blocks about 200 times apiece. This is the magic of tracing: now you know, unequivocally, what happened.

The next question is, why?

It’s because of pressure on the database buffer cache. By the time this program needed to access 122:3635207 a second time (after having read it once already), that block had been aged out of the database buffer cache by some other program. So the program being traced had to read it again. The trace file shows this: every few hundred thousand lines, you’ll see another “db file sequential read” call for block 122:3635207. There are 127 such lines for this block in the trace file. And there are hundreds of other such lines for thousands of other such blocks!

Here’s the same report for the faster execution:

FILE:BLOCK         DURATION       %    CALLS      MEAN       MIN       MAX
---------------  ----------  ------  -------  --------  --------  --------
53:1273982         0.110119    0.0%        1  0.110119  0.110119  0.110119
84:1248015         0.099944    0.0%        1  0.099944  0.099944  0.099944
102:2950178        0.095721    0.0%        1  0.095721  0.095721  0.095721
59:1960583         0.094851    0.0%        1  0.094851  0.094851  0.094851
98:190433          0.090348    0.0%        1  0.090348  0.090348  0.090348
92:1611224         0.082210    0.0%        1  0.082210  0.082210  0.082210
45:1377638         0.065793    0.0%        1  0.065793  0.065793  0.065793
142:2384429        0.059034    0.0%        1  0.059034  0.059034  0.059034
54:261303          0.054912    0.0%        1  0.054912  0.054912  0.054912
78:52474           0.053043    0.0%        1  0.053043  0.053043  0.053043
446,838 others   263.903507   99.7%  458,229  0.000576  0.000157  0.050658
---------------  ----------  ------  -------  --------  --------  --------
TOTAL (446,848)  264.709482  100.0%  458,239  0.000578  0.000157  0.110119

This execution read each of its blocks once and only once. (There are a few non-1 values hiding in the “others” row, but not enough to worry about. Can you see how I know that?)

So, how much time did the slow execution waste on block re-reads? With the Method R Workbench’s mrskew utility, we can calculate that. It takes a little bit of fancy command line work, so we put into a .rc file, where we can save the results. Here’s what that looks like; it’s a file called “first-read-only.rc”, which we shipped in Workbench version 9.2:

# Report only the duration of a given data block's first read into the buffer
# cache; that is, discard the durations of second and subsequent reads.
#
--init='
	# Hash of data block addresses (DBAs).
	my %h = ();	
	# Return duration only if this is the first read call for the block.
	sub f($$) {				
		my ($f, $b) = @_;
		# file id, block id.
		if ($h{"$f:$b"}) {
			# If we've seen this DBA before, then return a zero duration.
			return 0;
		} else {
			# Otherwise, record its existence and return its duration.
			return $h{"$f:$b"} = $af;
		}
	}
'
--name='db file sequential read'
--alldepths
--group='"$p1:$p2"'
--group-label="FILE:BLOCK"
--select='f($p1,$p2)'
--select-label="DURATION-1ST-READ-ONLY"

It looks kind of messy, but it took me only about ten minutes to write, because it’s very similar to another .rc file called “distinct-texts.rc” that we already ship with the product.

Running mrskew with this .rc file produces the following result (I’ve trimmed off some irrelevant columns):

FILE:BLOCK         DURATION-1ST-READ-ONLY
-----------------  ----------------------
77:527660                        0.072600
47:624193                        0.058515
45:2990164                       0.051676
74:3024653                       0.043204
44:1372296                       0.024697
67:545061                        0.019459
77:1596047                       0.019336
113:1888944                      0.019321
119:468208                       0.018722
64:1595078                       0.018314
1,062,476 others               577.876788
-----------------  ----------------------
TOTAL (1,062,486)              578.222632

What does this report mean? It means that if this execution had read each block only once, then “db file sequential read” calls would have consumed only 578 seconds instead of 7,862.

The big deal about the whole Method R experience is that we can tell precisely how your programs spend their time. We can also tell you how much of your time is being wasted. That’s important, because knowing is way better than guessing.

By Cary Millsap

Owner and president of Method R Corporation.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: