Categories
Blog Posts Release Announcements

Method R Workbench 9.4

Yesterday, we released Method R Workbench 9.4.1.0. This article is a tour of some of the 50 new application behavior changes.

Command History Scrolling

In Workbench 9.3, it was a pain to try to find a given report in the output pane. For example, do this:

  1. Run the “Time by line range” action upon a 1,000-line trace file, with 1- as the range. The output will be 1,000 lines.
  2. Run the “Duration by call name” action upon the same file. The output will be about 15 lines.
  3. Run the “Time by line range” action again. Another 1,000 lines.
  4. Now, find the “Duration by call name” report, which now is buried between two 1,000-line reports.

This was a fun design problem. My initial idea was a “table of contents” dialog. It would list the actions that had been run, and then clicking on one would scroll the output pane back to the first line of that action’s output. But I hate dialogs.

It sucks when you hate your own best idea. But that’s what teams are for. Jeff proposed a far better idea: simply link the command history feature with the output pane. So now, finding that “Duration by call name” report is as easy as:

  1. Arrow up and down through the command text field until you get to the action you want to the report for. Then, presto!, there’s its report in the output pane.

We want our software to feel like “PFM” (pure magic). This design element is one of the opportunities you seek to fit that description. It wasn’t easy to implement, given that the size of the output pane is variable (View › Zoom), but that’s our problem to worry about, not yours.

CSV for Importing into Excel

One of the stories I like to tell was when we helped a big company fix an overnight batch job duration problem by exporting our files panel content into CSV that you can copy and paste into Excel. As a result of that engagement, we added a new runnable action called “Details by file, for importing into Excel”. It used our mrls utility. The problem is, mrls is slower and less accurate than mrprofk, which creates the information for the files pane.

There’s no need to be slow and wrong when you can be fast and right. So we eliminated the action in the actions pane, and we added a shift-click feature to the existing “Copy selected file rows to output” button. Now it’s the “Copy selected file rows to output (shift-click for CSV)” button. It’s both accurate and blazing fast.

Sounds

Clearly, most of the information transfer from the Workbench application to your brain occurs through your visual channel. But one day in July 2020, one of our customers said this:

Running mrskew is so fast and the resulting display update so fluid I often cannot even tell if it ran…

After studying the problem a little bit, we decided to begin to use sound as an additional channel for transmitting information from our Workbench application to your brain. Now, when you run a command that might take a while, you’ll get a little thumbs-up or thumbs-down sound when it’s finished. Now you can look away from the screen and still know definitively that your action has finished. We added that feature in August 2020, in Workbench 9.1.

One thing had been bugging me, though: the sounds didn’t really fit in with the other sounds my system makes. On macOS and Windows, there’s actually a system settings option that allows you to specify an alert sound for your system. You get to choose which sound you want, how loud it should play, and which device it should play on. Prior to 9.4, our Workbench application didn’t respect those settings. In 9.4, we do.

It’s nice to fit in.

Undo and Redo

We worked long and hard on undo and redo in this release. Prior to 9.4, undo was restricted to the files pane. You could undo box-checking, and you could undo load and unload operations. We found the feature confusing to use, though, for a variety of reasons. At the same time, we didn’t offer any undo features in the command text field (the text field at the top of the output pane).

So, in 9.4.1, we’ve added good old-fashioned undo and redo with ⌘Z and ⇧⌘Z (Ctrl-Z and Shift-Ctrl-Z on Windows) in the command text field and both the filter fields.

Move to Trash

Sometimes in our Workbench workflow, we come across files that we know we’re never going to want to load again. Those are the files with zero durations, or files with trace level 0 (see our new “Level” column, which reveals each file’s Oracle tracing level). Going to the filesystem browser to delete those files is tedious and dangerous—I hate look-here, but click-there workflows. It’s just easier to select the files you don’t want and delete them, with File › Move to Trash (it’s Move to Recycle Bin on Windows), right in the Workbench application.

It may sound dangerous, but don’t worry, Move to Trash just moves items to your trash can. If you make a mistake, you can still retrieve them.

If you don’t see the Move to Trash (or …Recycle Bin) option in your File menu, then you should update your JDK version to 9+. You can see the version you’re using by clicking Help › Diagnostics.

Label Expressions

In my August 2023 article called “A Design Decision,” I talked about an enhancement that lets you use expressions in --group-label and --select-label values. What we didn’t realize at the time was that, in those expressions, mrskew wouldn’t let you refer to functions you’d defined yourself in the mrskew --init block. In 9.4.1, we’ve fixed that problem.

It’s a neat feature. We use it now in our three histogram RC files (disk.rc, p10.rc, ssd.rc) to make our code more elegant. The ability to use --group-label='label(0)' is the trick. I expect our users to push the feature even harder. See, for example, Jared Still’s amazing --init blocks on Slack.

And More…

In the past few years, we’ve been increasingly required to carve through hundreds or thousands of trace files. We’re talking so many files that referring to “*” on the command line overfills the command line buffer. So each release in 2023 and 2024 has had some features to make managing lots of trace files easier. You can see all about what we’ve done at our Workbench Release Notes page.

I hope you enjoy!

Categories
Blog Posts

How Did You Make mrskew 20× Faster?

A couple of years ago (June 30, 2020), we released Method R Workbench version 9.0.0.66. It had 113 new features and bug fixes. One of those features was case 7800: “mrskew is now 10–20× faster than before.”

We’re prone to sneaking in performance improvements like that. It’s because we, too, use the software we sell, and we don’t like waiting around for answers any more than you do.

mrskew, in case you don’t know, creates flexible, variable-dimension profiles. It’s a skew analyzer for Oracle trace files.

…A what?

It’s a tool that can query across trace files (thousands of them, if that’s how many you have) and answer questions like these:

  1. What kinds of calls dominated the response time of your user’s experience? Imagine for the sake of this example that the answer is “read calls.” How much time did read calls take? How many read calls did your program make?
  2. Were all your read calls the same duration? Or did some take longer than the others? How much time could you save if you eliminated the slowest 10,000 read calls?
  3. How many blocks did the longest read calls read?
  4. What are the file and block IDs of the longest read calls?
  5. Are the slowest read calls associated with a particular file?
  6. Are they associated with a particular SQL statement?
  7. On what line of what trace file can you find information about your longest read call?

Our mrskew tool can answer questions like these and more.

Here are the commands to do it. Don’t let these scare you. You can summon any one of them (or any of 30+ others) with just a click, in our Method R Workbench:

  1. mrskew *.trc
  2. mrskew --name='read' --rc=p10.rc *.trc
  3. mrskew --name='read' --group='"$p3"' --gl='"BLKS/READ"' *.trc
  4. mrskew --name='read' --group='"$p1:$p2"' --gl='"FILE#:BLOCK#"' *.trc
  5. mrskew --name='read' --group='"$p1"' --gl='"FILE"' *.trc
  6. mrskew --name='read' --group='"$sqlid"' --gl='"SQLID"' *.trc
  7. mrskew --name='read' --group='"$base:$line"' --gl='"FILE:LINE"' *.trc

Now, imagine trying to ask 2GB of trace data all these questions. Without mrskew, it would probably take you a day or more to fish the answers out of your trace files (don’t bother looking in AWR or ASH; they’re not there).

A Workbench 8 mrskew execution on 2GB of input takes about 4 minutes. That’s about half an hour to run all seven commands. That’s pretty good compared to a day or two of fishing.

A Workbench 9 mrskew execution on the same input takes only about 12 seconds. That’s less than 2 minutes to answer all the questions I’ve posed here. That’s remarkable.

2019 MacBook Pro (Intel)mrskew *.trc (2GB)
Method R Workbench 8240 seconds (4 minutes)
Method R Workbench 912 seconds
mrskew execution times before and after the Method R Workbench 9 upgrade.

So, an interesting question, then, might be, “How did you do that?”

Well, that’s easy: a long time ago, I hired Jeff Holt.

How did Jeff do it?

Simple. He rewrote mrskew in C.

In Workbench 8, mrskew was a Perl program that I had written in 2009. Perl is admittedly slow, but I was interested in having a program that users could interact with using Perl’s full expression syntax.

mrskew worked really well, and we used it a lot. But it always felt weird that it was so much slower than our other utilities that do even more work (like mrprof). So Jeff, in his spare time, investigated whether he could rewrite mrskew in C. It was no small feat, given that I insisted upon keeping the full Perl expression interface.

One day he surprised me. The new C version of mrskew was passing all our automated tests and we could probably ship it now. I asked him how much faster it was. He said about 20×.

I’m used to this kind of thing with Jeff by now. But still.

The result of Jeff’s investigation is that now we have a skew analysis tool that works just as fast as our other outrageously fast tools, even when you’re battling data by the gigabyte. Today, mrskew is a standard feature of pretty much every performance improvement project we hook into, and we’re grateful that it doesn’t make us wait a long time for the answers we need.

To get a better understanding of what skew is and why it’s important, see chapter 38 of How to Make Things Faster. If you’re interested in more detail about mrskew, visit our mrskew manual page.

Categories
Blog Posts

Coherency Delay

Today, a reader of How to Make Things Faster asked a question about coherency delay (chapter 47): How does coherency delay manifest itself in trace files?

My own awareness of the concept of coherency delay came from studying Neil Gunther’s work, where he defines it mathematically as the strength β that’s required to make your scalability curve fit the Universal Scalability Law (USL) model.

Such a definition can be correct without being especially helpful. It doesn’t answer the question of what kind of Oracle behaviors cause the value of β to change.

think that one answer may be “the gc stuff.” But I’m not sure. One way to find out would be to do the following:

  1. Find a workload that motivates a lot of gc events on a multi-node RAC system.
  2. Fit that configuration’s behavior to Gunther’s USL model.
  3. Run the same workload on a single node.
  4. Fit that configuration’s behavior to USL.
  5. If the value of β decreases significantly from the first workload to the second, then there’s been a reduction in coherency delay, and there is a strong possibility that the gc events were the cause of it.

That may not feel like a particularly practical proof (it would require a lot of assets to conduct), but it’s the best proposal I can think of here at the moment.

The biggest problem with executing this “proof” (whether it really is a proof or not is subject to debate) is that there’s probably not much payout at the end of it. Because what does it really matter if you know how to categorize a particular response time contributor? If a type of event (like “gc calls”) dominates your response time, then who cares what its name or its category is, your job is to reduce its call count and call durations.

(Lots of people have already discovered this peril of categorization when they realized that—oops!—an event that contributes to response time is important, even if the experts have all agreed to categorize it using a disparaging name, such as “idle event.”)

Whether the gc events are aptly categorizable as “coherency delays” or not, my teammates and I have certainly seen RAC configurations where the important user experience durations are dominated by gc events. In fact, our very first Profiler product customer back in roughly the year 2000 was having ~20% of their response times consumed by gc stuff on a two-node cluster back when RAC was called Oracle Parallel Server (OPS).

We solved their problem by helping them optimize their application (indexes, SQL rewrites, etc.), so that their workload ended up fitting comfortably on a single node. When they decommissioned the second node of their 2-node cluster, their gc event counts and response time contributions dropped to 0. And another department got a new computer the company didn’t have to pay for.

The way you’d fix that problem if you could not run single-instance is to make all your buffer cache accesses as local as possible. It’s usually not easy, but that’s the goal. And of course, RAC does a much better job of minimizing gc event durations than OPS did 23 years ago, so globally it’s not as big of a problem as it used to be.

Bottom line, it might be an interesting beer-time conversation to wonder whether gc events are coherency delays or not, but the categorization exercise is only a curiosity. It’s not something you have to do in order to fix real problems.

Categories
Blog Posts Videos

Fill the Glass

Today, Cary Millsap hosted the inaugural episode of his new weekly online session, called “Fill the Glass.” Episode 1 was an ask-me-anything session, covering topics including how to access the Method R workspace in Slack, advice about being your own publisher, and our GitHub repository (available now) for Cary’s and Jeff’s new book, “Tracing Oracle” (available soon).

Visit our “Fill the Glass” page for access to past recordings and future live sessions.

Categories
Blog Posts Videos

Insum Insider: How to Optimize a System with Cary Millsap

Today, Michelle Skamene, Monty Latiolais, and Richard Soule of Insum chatted with me for an hour on their live stream. I hope you’ll enjoy it as much as I did.