Examining System Crashes and Hangs
LISA: Where systems engineering and operations professionals share real-world knowledge about designing, building, and maintaining the critical systems of our interconnected world.
The LISA conference has long served as the annual vendor-neutral meeting place for the wider system administration community. The LISA14 program recognized the overlap and differences between traditional and modern IT operations and engineering, and developed a highly-curated program around 5 key topics: Systems Engineering, Security, Culture, DevOps, and Monitoring/Metrics. The program included 22 half- and full-day training sessions; 10 workshops; and a conference program consisting of 50 invited talks, panels, refereed paper presentations, and mini-tutorials.
Max Bruning, Joyent
Max Bruning, Joyent

Max Bruning began using and programming Unix-based systems while obtaining a Master's degree at Columbia University in the late 1970's. He has spent many years doing kernel development, as well as teaching Unix courses at various companies. He has done consulting and/or training work for Bell Labs, AT&T, Motorola, Sun Microsystems, HP, Siemens-Nixdorf, and various other companies. In September 2010, he started porting Linux KVM to SmartOS for Joyent. He is currently the Training Director at Joyent.

author = {Max Bruning},
title = {Examining System Crashes and Hangs},
year = {2014},
address = {Seattle, WA},
publisher = {USENIX Association},
month = nov
}
This 1/2-day tutorial examines debugging tools and techniques. It concentrates on kernel debugging, with emphasis on tools available on SmartOS. Tools for other operating systems (e.g., Linux and *BSD) are also covered. The tutorial starts with a description of the available tools on each of the OSes. It then goes through several example debugging sessions, using the tools and the source code to drill down to find the root cause of the problems. For completeness, some of the discussion and a lab is aimed at user-level problems. To follow along, attendees should have access to at least a machine (can be virtualized) running SmartOS, available here. Access to Linux and FreeBSD systems will be helpful.
Developers, devops, and experienced administrators who have encountered crashes will benefit from this tutorial.
Some knowledge of how to debug kernel crashes and hangs.
- Types of Bugs
- Kernel Panics and Application Crashes
- Hangs
- Errors
- SmartOS, illumos, and Solaris 10 derivatives
- Intro to mdb(1)
- The 5% You Need to Know
- Intro to DTrace
- Techniques
- Linux
- kdb, *dbg, crash, etc.
- systemtap and other DTrace equivalents
- *BSD
- kgdb, ddd, ddb, etc.
- DTrace
- All systems
- print statements
- other techniques/strategies
connect with us