Music is the space between the notes
– Claude Debussy
I recently spent some time looking at a slow shutdown issue. When product X was installed, the system was taking 34 seconds to shut down, vs 22 seconds when the product was not installed. Our automation team was able to reproduce this issue using the Full Boot Assessment in the Windows ADK. This gave us a very good place to start. In addition to a regular .etl trace file, the ADK also provides some extra information to help organize the millions of system events into a coherent narrative.
After comparing this trace to a normal shutdown, it was clear the the IO Shutdown System section was much longer than normal — 10 seconds vs. close to 0 seconds in the normal case.
So, I got out my performance analysis toolbox and looked at:
– CPU usage
– Wait chains
– Disk usage
None of these approaches proved fruitful. There was very little CPU activity, nobody seemed to be waiting on anyone else, and disk usage was minimal. How could performance be impacted when nobody was doing anything? I reached out to a colleague and mentioned the major facts: slow shutdown, 10 second delay during the IO Shutdown phase, not much activity. His psychic response? “10 seconds? If it is exactly 10 seconds it sounds like a timeout”. I looked at a couple more traces and sure enough, the delay was always 10 seconds plus a tenth of a second or so.
We figured out the problem in the end but the big takeaway for me is a new tool for my performance analysis toolbox — timeout analysis, where the space between the notes is more important than the notes themselves.