Debugging Windows Service Startup Using Procdump

tl;dr: Sleeping and attaching a debugger? Meh. Writing copious log files? Meh. In the case of a crashing service, it’s much easier to collect the crashdump and analyze.

If you’ve spent much time developing Windows Service, you’ve probably run into the case where your service mysteriously crashes while it is starting. In cases like these, Windows isn’t always particularly helpful. For example, here’s what Windows has to say about my Crashy Service:

service_crash_eventlog

“The Crashy Service service terminated unexpectedly.” Thanks Windows!

Since the crash is happening during service startup, it’s a little tricky to attach a debugger (or downright impossible in customer environments). Instead, as an exercise, let’s use the Sysinternals procdump tool to collect the crash dump and Visual Studio 2013 to figure out what went wrong.

Here are the steps:

1. We set procdump as the system postmortem debugger using the -i flag. This allows procdump to write out the state of the process before it crashes.

service_crash_procdump

2. Now we reproduce the service startup crash. Before the service crashes, procdump will write the crashdump to the directory where we ran the -i command in the previous step.

3. (Optional) Revert the system postmortem debugger to the previous value using the procdump -u command.

service_crash_procdump_uninstall

4. Open the .dmp crashdump file in Visual Studio 2013. When I do this, I see an initial screen that tells me a bit more about the exception that led to the crash. I then have the option of digging into the crashdump via the “Debug with Native Only” action.

service_crash_vs00

service_crash_vs0

5. When I click on Debug with Native Only, I get access to the call stack as well as the exact line of code that caused the crash (give or take — nobody’s perfect!) In this case it appears that I’m trying to lower case a null string, which throws an access denied exception.

service_crash_vs1

service_crash_vs2

More info: I’m analyzing the binary on the same machine where I compiled it. In other situations, you will have to tell VS where your symbols are. Also, in general I would probably use WinDbg to analyze the crashdump as I have way more experience with it. !analyze -v is your friend.

Advertisements
Posted in Debugging, Sysinternals Tools | Leave a comment

Performance Analysis Toolbox: Timeouts

Music is the space between the notes
– Claude Debussy

I recently spent some time looking at a slow shutdown issue. When product X was installed, the system was taking 34 seconds to shut down, vs 22 seconds when the product was not installed. Our automation team was able to reproduce this issue using the Full Boot Assessment in the Windows ADK. This gave us a very good place to start. In addition to a regular .etl trace file, the ADK also provides some extra information to help organize the millions of system events into a coherent narrative.

Image

After comparing this trace to a normal shutdown, it was clear the the IO Shutdown System section was much longer than normal — 10 seconds vs. close to 0 seconds in the normal case.

So, I got out my performance analysis toolbox and looked at:

– CPU usage
– Wait chains
– Disk usage

None of these approaches proved fruitful. There was very little CPU activity, nobody seemed to be waiting on anyone else, and disk usage was minimal. How could performance be impacted when nobody was doing anything? I reached out to a colleague and mentioned the major facts: slow shutdown, 10 second delay during the IO Shutdown phase, not much activity. His psychic response? “10 seconds? If it is exactly 10 seconds it sounds like a timeout”. I looked at a couple more traces and sure enough, the delay was always 10 seconds plus a tenth of a second or so.

We figured out the problem in the end but the big takeaway for me is a new tool for my performance analysis toolbox — timeout analysis, where the space between the notes is more important than the notes themselves.

Posted in Performance | Tagged , , | Leave a comment

Build 2014: Favourite Sessions

Build2014

Build 2014 is a distant memory at this point. Luckily, all sessions are still available for viewing on Channel 9. Here are my favourite sessions from the conference.

Modern C++: What You Need to Know
If you’re still in the dark about C++11/14, check out Herb Suter’s talk on the power and ease of using the new additions to the C++ language. There’s also a great section in the talk about the performance benefits of C++ arrays (and contiguous arrays in general).

Tips and Tricks in Visual Studio 2013
Cathy Sullivan presents an hour of ways to increase productivity and decrease frustration in Visual Studio 2013. If you use Visual Studio on any sort of regular basis, you will benefit from this hour. Ctrl+Q is your friend.

Diagnosing Issues in Windows Phone XAML Apps Using Visual Studio
I’ve never written a Windows Phone App but still found this session very interesting. Dan Taylor uses Visual Studio to analyze CPU, UI throughput, memory, and energy efficiency issues. Visual Studio debugging and analysis tools have come a *long* way

Windows and the Internet of Things
Steve Teixeira helped me understand what the internet of things is all about. Great presentation and great content. Most of Build 2014 was focused on convergence of Windows, Phone, and XBox; this session introduced me to some divergent flavours of Windows that I had not heard of.

Native Code Performance on Modern CPUs: A Changing Landscape
There was a noticeable lack of low-level sessions at Build 2014 and so the minor headache induced by Eric Brumer’s complex talk was most welcome. The big takeaway: with ever-faster CPUs, memory operations have become the bottleneck. Dovetails nicely with Herb Suter’s session above, but goes much more in-depth.

Avoiding Cloud Fail: Learning from the Mistakes of Azure with Mark Russinovich
As always, Russinovich is engaging and informative. Though the talk is focused on cloud there are lessons here for all types of development, for example, pragmatic exception handling strategies.

Posted in Performance, Training, Troubleshooting | Tagged , , , , | 2 Comments

Visual Studio Productivity Power Tools 2013 Kills Productivity

Update: This issue has been resolved and verified. You can get it from Visual Studio Gallery at http://visualstudiogallery.msdn.microsoft.com/dbcb8670-889e-4a54-a226-a48a15e4cace

At Build 2014, Microsoft did a suitable job of persuading me to run the latest version of Visual Studio. I updated to Visual Studio 2013 Update 2 RC and also installed the Productivity Power Tools 2013 extension. Soon afterwards, I started seeing long-lasting hangs in Visual Studio while editing my fairly modest C++ solution.

I initially tried to ignore the problem by killing and restarting Visual Studio, but this quickly compounded my frustration. My next tactic was to record one of these hangs using Windows Performance Recorder (see my previous post for background on this tool).

Upon opening the recorded trace in Windows Performance Analyzer, I came to the conclusion that Visual Studio had stopped pumping Windows messages. As we see in the graph below, thread 5959 went 179 seconds without checking for new messages. Why?

vs-hang1

One possibility was that the UI thread was waiting on another thread. However, when I opened the CPU Usage (Sampled) graph and filtered it to thread 5956, I could see that the thread was pegging the CPU. My machine has four logical processors and so 25% CPU usage means that one of my processors was being utilized completely.

vs-hang2

After loading symbols, I was able to drill down into the aggregated stack. I didn’t know exactly what I was looking for, but I eventually saw some enlightening function names on the stack.

vs-hang3

The aggregated stack usage showed that a lot of processing time was being spent in SolutionErrorFilter.dll. At this point, I remembered that I had installed the Productivity Power Tools 2013 extension so that I could see errors highlighted in Solution Explorer.

After proving to myself that SolutionErrorFilter.dll was not part of the base Visual Studio 2013 installation (it was installed to the extensions folder), I uninstalled the Productivity Power Tools 2013 extension. At this point, somewhat ironically, my productivity increased back to normal levels.

Big thanks to Microsoft for releasing symbols for Visual Studio, as well as for appropriately naming their binaries.

Despite this issue, I still recommend the Build 2014 talk entitled Tips and Tricks in Visual Studio 2013. I will post other favourite talks soon.

Update (from Microsoft): 

Thank you for your feedback. We are aware of the issue and are investigating a fix. We’re tracking this issue through connect id 816883.http://connect.microsoft.com/VisualStudio/feedback/details/816883/ide-becomes-unusable-if-error-list-contains-too-many-entries

We will resolve this issue as a duplicate of that one. You can vote 816883, but we’re already looking to make a fix.

Aside | Posted on by | Tagged , , | 4 Comments

Troubleshooting Windows Startup Using Autoruns

Recently, my Windows laptop started the odd behaviour of hanging every time I logged in. It didn’t blue screen; rather, it just froze completely every time the desktop came up. This is a pretty uncommon scenario so I figured that one of my installed applications was misbehaving. I didn’t have the time nor the inclination to muck around with the Run registry key and its ilk, so I turned to Autoruns for Windows to help me diagnose the issue.

Autoruns lets you view and disable pretty much anything that starts up with Windows — drivers, services, Explorer extensions, tasks that run at Logon, and many others.

Autoruns Main UI

Since I was seeing the hang on Logon, I rebooted into Safe Mode and used Autoruns to disable various programs listed under the Logon tab. After going through this cycle a couple of times, I figured out that my VPN client was causing the hang.

Unfortunately I needed the VPN client to connect to my corporate network so I had to keep digging. Eventually, I had a vague recollection of seeing this problem before, when enabling kernel debugging on a test machine, but not connecting the debugger. <SMACK> For some reason, I had enabled kernel debugging then forgotten about it. I quickly disabled this option via ‘bcdedit /debug off ‘and reboot my machine.

After getting my system up and running again, I used Autoruns to re-enable the programs that I had previously disabled. Now I just have to remember to not enable kernel debugging on my dev box…

Posted in Troubleshooting | Tagged | Leave a comment

Writing a WPA Regions of Interest File

In my previuos article I wrote a very simple Windows Performance Recorder profile to enable a particular ETW Provider. After recording a trace, the next step is to open it up in Windows Performance Analyzer (WPA) to see what is going on. There can be a lot of events in a trace and so anything that can help with pattern detection is most welcome.

One nice feature in WPA (starting in Windows 8.1) is called Regions of Interest. This allows you to link any two events so that they appear together in the UI. For example, if you log at the start and end of a method, it is quite natural that you would want to link these events together so that you can easily figure out the duration of the method, what else happened on the system during that time period, etc…

Example: I want to know how much CPU time my HCKServiceValidation function takes. I instrument the function with events at the start and end. Next, I record a trace, open it up in WPA and add my Regions of Interest file. After adding a couple of graphs and filtering on my process,  I get something very cool — I can see that the HCKServiceValidation function took 0.12 seconds to run, and maxed out at about 25% CPU usage.

Function CPU Usage

The Regions of Interest file is quite simple. You need only define a couple of GUIDs, and then you can link events via their event IDs.

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<?Copyright (c) Microsoft Corporation. All rights reserved.?>
<InstrumentationManifest>
 <Instrumentation>
 <Regions>
 <RegionRoot Guid="{934E499F-A064-4DA4-8D96-AAEF60E37F52}"
 Name="WHQL Region File Root"
 FriendlyName="Root">
 <Region Guid="{0EC1D460-B7C7-47F8-987C-6FA51DCA24BC}"
 Name="Region-HCK-Service-Validation"
 FriendlyName="HCK Service Validation">
 <Start>
 <Event Provider="{7D0810B4-15F2-4D2F-9C88-C885B330D944}" Id="2" Version="0" />
 </Start>
 <Stop>
 <Event Provider="{7D0810B4-15F2-4D2F-9C88-C885B330D944}" Id="3" Version="0" />
 </Stop>
 </Region> 
 </RegionRoot>
 </Regions>
 </Instrumentation>
</InstrumentationManifest>

Once you open your trace in WPA, you can add your Regions of Interest file under Trace->Trace Properties.

Adding Regions of Interest File

If any of the configured regions show up in the trace, you will see an additional graph called System Activity->Regions of Interest, which is what I displayed above. More information on MSDN.

Posted in Performance | Tagged , | Leave a comment

Writing a Very Simple WPR Profile

wpa1

In a previous post I discussed the joys of using the Windows Performance Toolkit to record and analyze Windows slowdowns. In summary, you can record a system trace using Windows Performance Recorder (WPR), and then use Windows Performance Analyzer (WPA) to visually drill into CPU, memory, & disk usage to figure out what is causing your slowdown.

The power of WPT does not end there. You can instrument and register your own Windows binaries to log events that can appear side by side with Windows system events. If you’ve ever tried to correlate your application logs with the Windows Event Viewer, or Process Monitor, you will immediately see the value here.

There are few steps involved in making this happen:

  1. Instrument your binary with Event Tracing for Windows (ETW) logging. ETW is a high performance logging mechanism that is baked into Windows. The instrumentation process is different for desktop apps, Windows Store apps, and drivers but all are supported.
  2. Register your binary as an ETW provider on the system.
  3. Configure WPR to enable your provider, and run your scenario. (You could also use xperf or an ETW controller of your choice.)
  4. Save the trace and open it with WPA.

The first two steps are out of scope of this article, but they are not particularly difficult thanks to some decent tools (ecmangen, mc, wevtutil). This article is a good starting point. [Update: Article is now only available as a .chm file. Download the April 2007 edition of MSDN Magazine from the link]

The next step is to create a Windows Performance Recorder Profile (wprp) file so that WPR knows how to enable your provider. MSDN has an article on authoring wprp profiles, but it is a little more complicated than we need.

I have cribbed from a couple of sources (including Bruce Dawson) and come up with a wprp profile that is about as simple as you can get. Comments inline.


 
  

 
 
 
 
 

 
 

 <!-- A fully specified .wprp file should have four profiles, with DetailLevel set to Verbose and Light and with Logging Mode set to Memory and File. WPR enforces that the name conforms to Profile.Level.OutputType --> 
 
  
 
 
 
 
 
 
 
 

  
 

 

  

 

You can add this profile to WPR and enable any other providers that you like.

WPR Profiles

Now you just need to start recording, run your scenario, and save your trace to disk. When you open the trace, you should magically see your provider and its associated events under System Activity -> Generic Events.

WPA Generic Events

Mission accomplished! I can now see exactly what happens on the system during my events of interest. Starting in WPT 8.1, you can even link events together using Regions of Interest. This is pretty well documented so I’ll save it for another time (or never).

Posted in Performance | Tagged , , | 3 Comments