Blog Posts

Quantifying SQL Shareability

Do you have SQL in your system that doesn’t use placeholders, like this?

  [3r3dhkb0z824v] select stuff from t where id=18432...
  [5wamvs45j6nh4] select stuff from t where id=4286...
  [ih5x9lgg492nk] select stuff from t where id=329971...

Statements like these are generated dynamically by procedural programs written in Java, PHP, C#, etc. (or even PL/SQL if you work really hard). Programs that generate SQL statements like these create a lot of performance problems:

  • They spend too much CPU time on database parse calls;
  • They serialize on library cache and shared pool latches, which diminishes your application’s scalability;
  • They abuse the library cache, causing your application to consume way more memory than it ought to;
  • And they create unnecessary network congestion.

It’s easy to detect this kind of a problem in your trace files with the mrskew utility in the Method R Workbench package.

Counting Shareable SQL Statements

The secret to counting shareable SQL statements is the mrskew $ssqlid variable (note the double-‘s’). In the example shown above, all of these similar but distinct SQL statements would have different $sqlid values (the ones shown), but they would all have the same $ssqlid value.

With just a little bit of magic, primarily with the --init, --group, and --select command line options, you can write a mrskew program that can count shareable statements:

# Count SQLID values that share the same SSQLID.
    my %h = ();
    sub f($) {
        my ($ss, $s) = @_;
        if ($h{$ss}{$s}) {
            # If we've seen this SSQLID/SQLID combo before, then return a count of zero new things seen.
            return 0;
        } else {
            # Otherwise, record its existence and return that we've seen a new one.
            return $h{$ss}{$s} = 1;

Saving the Report in an RC File

Yuk, right? There probably aren’t a whole lot of people in this world who are good today at writing these things, but fortunately, you have us at Method R for that.

Of course, for any sequence of arguments as complicated as this, you’d never want to have to figure this out twice (or even type it in twice), so this is a perfect candidate for saving in an RC file. Just echo the value of your environment variable called MRTOOLS_RCPATH to find out where mrskew is going to search for your RC files. Put a cut-and-paste copy of this program shown above in a file in one of your rcpath directories. I called my file ssql. Now, when I need it, I simply use the option --rc=ssql, and presto, I get a report that looks like this:

mrskew p1_ora_1492.trc --rc=ssql
----------  --------------  ------  -----  ----  ---  ---
4151812497              70   20.0%     70     1    1    1
3642320257              70   20.0%     70     1    1    1
2047770123              70   20.0%     70     1    1    1
1928547239              70   20.0%     70     1    1    1
1138917066              69   19.7%     69     1    1    1
3957414185               1    0.3%    349     0    0    1
----------  --------------  ------  -----  ----  ---  ---
 TOTAL (6)             350  100.0%    698     1    0    1

This table represents a program execution that Oracle’s tkprof would have reported as using 350 distinct SQL statements, and it would have reported statistics on each one of those statements individually. Each one consumes about the same total amount of time, so you’d be left looking at this list of 350 not-great-but-not-horrible-looking statements with this nothing-I-can-do feeling. But seeing the information the mrskew way tells you a lot more. This table tells you that there were really only six SQL statements that a developer actually wrote. The code then generated roughly 70 copies of each (of the first five of the six shown here) during the execution of this report.

Now I know there are no more than six places in the code that I need to go fix. And with a report like the following, I can know exactly which one to attack first:

$ mrskew p1_ora_1492.trc --name=parse --group='"[$ssqlid] ".substr($sql,0,60)' --where='$sqlid !~ /^#/'
'"[$ssqlid] ".substr($sql,0,60)'                                             DURATION       %  CALLS      MEAN       MIN       MAX
-------------------------------------------------------------------------  ----------  ------  -----  --------  --------  --------
[1138917066] select "T0"."C0" "C0", "T0"."C1" "C1", "T0"."C2" "C2", "T0".  192.792694   26.2%     69  2.794097  2.367640  4.498316
[4151812497] select "T0"."C0" "C0", "T0"."C1" "C1", "T0"."C2" "C2", "T0".  145.986808   19.9%     70  2.085526  1.556764  3.306498
[1928547239] with "INVOICE_DETAIL_SQL" as (SELECT DISTINCT RIH.REF_INVOIC  137.010170   18.6%     70  1.957288  1.569761  2.691591
[3642320257] with "INVOICE_DETAIL_SQL" as (SELECT DISTINCT RIH.REF_INVOIC  129.994236   17.7%     70  1.857061  1.364792  2.628600
[2047770123] with "INVOICE_DETAIL_SQL" as (SELECT DISTINCT RIH.REF_INVOIC  125.465926   17.1%     70  1.792370  1.333797  2.915557
[3957414185] SELECT DISTINCT RIH.REF_INVOICE_NUMBER ,RIH.INVOICE_NUMBER ,    4.176363    0.6%    349  0.011967  0.000000  4.135371
-------------------------------------------------------------------------  ----------  ------  -----  --------  --------  --------
TOTAL (6)                                                                  735.426197  100.0%    698  1.053619  0.000000  4.498316

Ha! I might have gotten lucky. Looks like there are really only three SQL statements in this application worth bothering over. Now I know exactly which one to look for first in my application code.

By Cary Millsap

Owner and president of Method R Corporation.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: