Check out the new USENIX Web site.
USENIX, The Advanced Computing Systems Association

2007 USENIX Annual Technical Conference

Pp. 351356 of the Proceedings

Short Paper: Exploring Recovery from Operating System Lockups

Francis M. David, Jeffrey C. Carlyle, and Roy H. Campbell, University of Illinois at Urbana-Champaign


Operating system lockup errors can render a computer unusable by preventing the execution other programs. Watchdog timers can be used to recover from a lockup by resetting the processor and rebooting the system when a lockup is detected. This results in a loss of unsaved data in running programs. Based on the observation that volatile memory is not affected when a processor a reset occurs, we present an approach to recover from a watchdog reset with minimal or zero loss of application state. We study the resolution of lockup conditions using thread termination and using exception dispatch. Thread termination can still result in a usable system and is already used as a recovery strategy for other errors in Linux. Using exceptions allows developers to write code to handle a lockup within the erroneous thread and attempt application transparent recovery. Fault injection experiments show that a significant percentage of lockups can be recovered by thread termination. Exception handling further improves the recoverability of the operating system.
  • View the full text of this paper in HTML and PDF. Listen to the presentation and Q & A in MP3 format.
    Click here if you have forgotten your password Until June 2008, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, 2007 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
To become a USENIX member, please see our Membership Information.

Last changed: 29 August 2007 ac