Bryan Cantrill at QCon

Every Windows System Admin should watch this video. Debugging Production Systems

The video is all about failures, transient failures and post-mortem debugging. It’s amazing how he covers the history, NTSB, Disaster Porn and makes it all come together in a debugger for Node.Js. It gives you a perspective on how you can approach debugging production systems.

The good news is, Windows already has good debugging tools and Symbol files where you can trace the issue down to a module / function level.

I also found this blog on replay debugging using vmWare Workstation.

Golden Rule : There are no one-off errors in production.

If it happened earlier, there’s a good chance that it can happen. It might be a good idea to get a full dump/VM Snapshots and start Windbg.

PS: You should check out Bryan Cantrill’s answer in Google Groups here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s