Programmers who use profilers write better code than programmers who don’t. In the old days, the only way to profile an Oracle application we were writing (or fixing) was to trace our code and then use Oracle’s tkprof to make sense of the detailed trace data. I have Oracle course material from 1990 that shows tkprof output with elements circled and little notes explaining what they meant. It’s what the best SQL developers of the day used to refine and optimize their code.

But tkprof has lots of problems. First, it’s a SQL profiler, not an application profiler. That means that it accounts for time spent in SQL statements, but it doesn’t try to account for time spent outside of SQL statements. It also ignores a lot of what’s in your trace file. For example, it ignores parent-child relationships among database calls (and OS calls), and it pays no attention to passage of time that isn’t accounted for by DB or OS call durations. It double-counts time. That’s why tkprof might show that your 10-second experience consisted of 8 seconds of DB call time and 6 seconds of “wait” time. Another problem is that tkprof aggregates when we don’t want it to, like showing only call counts and totals with no min/max or other skew information. It also doesn’t aggregate when we do want it to, like when it treats SQL statements that differ only in literal values as completely distinct.

There was a lot more information in our trace files than tkprof could show us, so we’d study the raw trace data with a text editor. Eventually, my colleague Jeff Holt got tired of studying raw Oracle trace data with a text editor, so he wrote a better profiler. That profiler ultimately grew up to be the Method R Profiler. Over the years, my staff and I have created a software tools ecosystem around our Profiler. These tools often give application developers and operational managers (DBAs, Unix admins, SAN admins, network admins, etc.) the first direct view of their end-users’ response time experiences that they’ve ever had in their careers.

The Method R Profiler: Simplicity, Structure, Guidance

We designed the Method R Profiler in the early 2000s, as I mentioned, in response to the deficiencies of Oracle’s tkprof profiling tool. Our design goal for the Profiler was to create an easy-to-read story that a trace file has to tell you about someone’s performance experience. That story had to be not just the SQL part of the experience, but the whole experience. We chose XHTML as the output format of the Profiler, because with that format, we could more expressively convey aspects of structure and priority that would help us better tell you that story.

The Profiler guides you through analyzing your trace files. The layout and the hyperlinks in a Profiler report give you the right structure for your analysis. This structure makes your analysis work repeatable and easy to learn. The orders in which we sort things, and the colors we use in the report focus your attention on what you should look at first. We have even gone to great lengths to elide (to not display) information that would be irrelevant to your analysis, that might lead you to waste your time.

The Profiler is easy to run, too. The only input it needs from you is the file you’d like it to help you analyze, which you can drag and drop right into the Profiler GUI. Everything else is automatic. By contrast, when you use tkprof, you have to make decisions (like sort orders, whether you want to see recursive SQL, how/whether you want to connect to the database when tkprof runs, ...) that require you to know what kind of problem you’re looking for before you even begin looking at your data.

To summarize:

  1. Method R Profiler gives you the story your trace file has to tell; a complete account of response time for the end-user experience you want to analyze.
  2. It guides you by using structure, color, and elision.
  3. It’s easy to learn.
  4. It’s easy to run.

The Profiler is simple on the outside but sophisticated on the inside. Most people who’ve used only tkprof don’t understand (because tkprof conceals it from them) that trace files show the passage of time that the Oracle Database can’t explain. When these unexplained durations are small, they don’t matter, but when they’re large, they matter a lot. The Profiler reports on all of these unexplained durations, and it prioritizes them for you so that when they’re important you see them, and when they’re not you don’t. We’ve learned over the years that it’s vital to report on exactly the duration described in your trace file, and the Profiler does that meticulously.

The Profiler does this all automatically, without your having to ask. That automatic-ness is why Profiler is the first tool we use (Jeff and I and our colleagues at Method R Corporation) to analyze trace data. The tool is mature, sophisticated, and in more than 90% of cases, it tells us everything we need to know so we can move into the remedial phase of a performance improvement project.

mrskew: Flexibility, Versatility

Sometimes, though, we need more. For starters, think about this: when you’ve built the only Oracle response time profiler in the world, how do you check to make sure that it is working properly? For several years, we desk-checked parts of our work against tkprof. But tkprof covers such a small proportion of what the Profiler does that we needed more sophisticated tools to help us check our work.

Desk-checking is just the tip of the iceberg. What if you wanted to see trace data structured another way, like listing statements in the order they were processed instead of their order of response time contribution? What if you wanted to search across trace files to learn about call latencies across user experiences? Or if you wanted to know which SQL statements were consuming the most database PARSE call duration out of a whole directory full of trace files? We also ran into problems with how people were collecting their trace data. What if someone collected trace data for 20 minutes, but you wanted to analyze only a 3-minute portion of the trace file? What about the connection pooling problem: when end-user experiences generate trace data that spans multiple trace files?

Each of these stories represents a useful feature, but each one contradicts the Profiler’s design goal (to render a complete account of a single end-user experience), and each would require a fundamental redesign and rewrite of the Profiler itself. Rather than risk breaking the Profiler, which does its job extremely well, we built an ecosystem of tools around the Profiler to help you do the extra things you need. This system of tools is called Method R Tools, (MR Tools, or “Mister Tools” for short).

The most interesting tool in the MR Tools kit is mrskew. mrskew is a general-purpose query tool for Oracle trace files. Because it has sorting, grouping, and filtering capabilities, you can use it to mine just about anything you can think of out of a trace file. You can find out, for example:

  • How much response time is consumed for database or OS call name?
  • How many rows are returned by each FETCH call?
  • Which SQL statements spent the most database PARSE call time?
  • Which database files account for the worst I/O call latencies?
  • Where in my trace files are the biggest gaps of duration that Oracle can’t account for?
  • During which database calls did your user’s process spend the most time preempted?

mrskew is an unbelievably versatile tool. We’ve used it to do things we never imagined it could do when we designed it. One of our customers wanted to show response time grouped by SQL statement, but he wanted the grouping function to consider SQL statements that differed only by their hints as identical. mrskew can do that. Another customer wanted to group not by call names, but by sequences of calls associated with the same PARSE or EXEC call. mrskew can do that, too. When we designed mrskew, we literally had no idea. mrskew can even tell you the database buffer cache hit rate for every single database call in your trace file. ...Not that you should want to go do that.

One of the most imporant things about mrskew is that it doesn’t have to take just one trace file as input; you can feed it hundreds of trace files at once if you want, and it will sort, group, and filter across them. This is how we answer questions like, “Does our system ever incur library cache misses on EXEC or FETCH calls?” Finally, one of the neatest things about mrskew is how easy it is to copy and paste the reports it makes right into your documents or slides you’re going to use as the “deliverables” for your work.

Method R Workbench

Method R Corporation sells Method R Profiler and Method R Tools together as a package called the Method R Workbench. With the Method R Workbench, you can accomplish remarkable workflows like these:

Routine performance testing during development

  1. With not even a single extra click, use the Method R Trace extension for Oracle SQL Developer collect perfectly scoped Oracle trace data for the combined SQL and PL/SQL code path you’re writing.
  2. Double-click the trace file that Method R Trace has automatically copied onto your workstation to create a perfect profile using the Method R Profiler.
  3. If the Profiler doesn’t give you as much detail as you’d like, dive into the trace data using mrskew to create histograms (themselves profiles) that help you understand exactly where your code is spending time, and exactly where your best leverage is for improving your code’s performance.

Poorly scoped trace data

  1. Profile a trace file, but learn that the trace data is poorly time-scoped; that is, that it includes a significant duration that wasn’t really part of the end user’s experience you’re trying to diagnose.
  2. Use mrskew to see exactly which lines you wish you didn’t have in your trace file.
  3. Use mrcallrm to zero out the durations of the calls you wish you hadn’t collected in the trace data. mrcallrm also automatically adjusts all of your clock values in the trace file so that the next time you run the Profiler on the file, you’ll see a profile for the exact duration you were looking for.
  4. Use mrskew to drill into the SQL statements, or database calls, or OS calls that consumed the most response time, all the way down to the individual call level of detail, if you want.

Connection pooling

  1. You can’t use the Method R Profiler on an individual user’s experience in an application that uses connection pooling, because there’s no single trace file that you can use as input to represent the response time experience.
  2. But if your application sets a unique CLIENT_IDENTIFIER for each distinct end-user experience (that is, for each distinct execution of a piece of your application code path), then use a simple mrskew query to list the personal response time experiences of your users in a table.
  3. Use mrskew to drill into any aspect of any of those personal response time experiences you want to drill into, for any experience you’ve traced.
  4. Use mrskew and Oracle’s trcsess tools to create a trace file that contains exactly the experience you want to profile, and then use the Method R Profiler to create a perfect profile.
  5. If your application doesn’t use CLIENT_IDENTIFIER instrumentation that makes the performance analysis job easy, then use mrskew’s ability to group by sequences of database calls associated with a given PARSE or EXEC call to approximate your end-users’ experiences.

Summary

The epitome of control over application performance is when you can see individual end-user response time experiences and drill into or across the measurements of those experiences to any level of detail that you want. This level of control is what you get with the Method R Workbench software package, which includes:

  • Method R Profiler gives you simplicity, structure, and guidance. It shows you the performance story that’s hidden inside an Oracle trace file.
  • Method R Tools gives you flexibility and incredible versatility. It can show you anything your trace files have inside them, however you want to see it.
  • Method R Trace gives you perfectly scoped Oracle trace files from Oracle SQL Developer with no extra effort.
It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. After working with such tools for seven years, I’ve become convinced that all compilers written from now on should be designed to provide all programmers with feedback indiating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.
—Donald Knuth