Every Windows System Admin should watch this video. Debugging Production Systems
The video is all about failures, transient failures and post-mortem debugging. It’s amazing how he covers the history, NTSB, Disaster Porn and makes it all come together in a debugger for Node.Js. It gives you a perspective on how you can approach debugging production systems.
The good news is, Windows already has good debugging tools and Symbol files where you can trace the issue down to a module / function level.
I also found this blog on replay debugging using vmWare Workstation.
Golden Rule : There are no one-off errors in production.
If it happened earlier, there’s a good chance that it can happen. It might be a good idea to get a full dump/VM Snapshots and start Windbg.
PS: You should check out Bryan Cantrill’s answer in Google Groups here.