Check out the new USENIX Web site.

Home About USENIX Events Membership Publications Students
OSDI 2000 Abstract

Exploring Failure Transparency and the Limits of Generic Recovery

David E. Lowell, Compaq Computer Corp.; Subhachandra Chandra, and Peter M. Chen, University of Michigan


We explore the abstraction of failure transparency in which the operating system provides the illusion of failure-free operation. To provide failure transparency, an operating system must recover applications after hardware, operating system, and application failures, and must do so without help from the programmer or unduly slowing failure-free performance. We describe two invariants that must be upheld to provide failure transparency: one that ensures sufficient application state is saved to guarantee the user cannot discern failures, and another that ensures sufficient application state is lost to allow recovery from failures affecting application state. We find that several real applications get failure transparency in the presence of simple stop failures with overhead of 0-12%. Less encouragingly, we find that applications violate one invariant in the course of upholding the other for more than 90% of application faults and 3-15% of operating system faults, rendering transparent recovery impossible for these cases.
?Need help? Use our Contacts page.

Last changed: 16 Jan. 2002 ml
Technical Program
OSDI 2000 Home