Last week, I started seeing a 30-45 second hang when right-clicking on large .zip files. Troubleshooting this kind of slow system performance usually takes one of two forms:
- Blame the anti-virus and grumble about it.
- Systematically try and modify state (remove programs, muck with the registry etc..) to return the system to a “good” state.
Now that I work at an AV company, 1 doesn’t seem particularly useful and 2 has never been the most reliable. There has to be a better way!
Wait! There is a better way: use the Windows Performance Toolkit (WPT).
The WPT is a free toolkit that allows you to record system traces and analyze them to track down performance issues. Older versions of the toolkit relied on the arcane syntax of xperf. The Windows 8 WPT simplifies this with the Windows Performance Recorder and Windows Performance Analyzer tools. The tools also run on Windows 7.
So, let’s see it in action. There’s an article on installing the WPT over at the Microsoft Premier Field Engineering blog and a great ‘getting started‘ post over at Random ASCII. Setting up symbols is also important so that you can see stack traces. I had some problems with this, but brucedawson helped (read the comments).
Step 1: Record the trace using Windows Performance Recorder. Since I am on Windows 7 x64, I let the tool disable the paging executive and did a reboot (this is not required on later operating systems).
Step 2: Reproduce the right-click hang and then save the performance trace to an .etl file.
Step 3: Open the performance trace in Wiindows Performance Analyzer. The trace should be in <My Documents>\WPR Files. Look at all the pretty graphs!
The first thing to notice is the graph under System Activity called UI Delays. In this case, it shows explorer.exe being delayed twice. The duration of the second delay is about 46 seconds.
At this point, the endeavour becomes an exercise in data mining through 150MB of information. The wait analysis episode of Defrag Tools helped me through it. The key idea here is that our trace file contains two kinds of stack info: CPU Usage (Sampled) is a snapshot of the stack running on each CPU every
100 10 ms, and CPU Usage (Precise) is a record of the stack running on each CPU every time there is a context switch. In this case, we’re trying to figure out why the UI is hanging, so we care more about the latter.
So, we open up the CPU Usage (Precise) graph, and display the table data. We want to view the data in terms of process/thread/stack, and we also want to know if the thread is waiting on another thread. The key piece of information here is the max Wait time. This will allow us to drill down and see the exact function call that is slowing us down.
Now, it’s just a matter of
picking a thread in explorer.exe with a high max wait time and starting with the thread that the UI Delays table points to and following the trail until we hit our problem function, or another thread. The first thread I followed ended with a wait on another thread.
After going through a few threads, I eventually ended up finding the culprit — PGExtension.dll
PGExtension.dll is a third-party .dll so we don’t have symbols in the trace. A quick search on my system shows that it belongs to Avecto Privilege Guard, which is part of the corporate image where I work. This tool allows non-admins the ability to perform certain administrative functions, like installing white-listed programs. Luckily, I have been granted Administrator access on my box so I uninstalled Privilege Guard and the problem went away.