Check out the new USENIX Web site.

USENIX, The Advanced Computing Systems Association

NSDI '06 — Abstract

Pp. 225–238 of the Proceedings

Subtleties in Tolerating Correlated Failures in Wide-area Storage Systems

Suman Nath, Microsoft Research; Haifeng Yu, and Phillip B. Gibbons, Intel Research Pittsburgh; Srinivasan Seshan, Carnegie Mellon University


High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures is a key issue in achieving high availability in today's wide-area environments. This paper systematically revisits previously proposed techniques for addressing correlated failures. Using several real-world failure traces, we qualitatively answer four important questions regarding how to design systems to tolerate such failures. Based on our results, we identify a set of design principles that system builders can use to tolerate correlated failures. We show how these lessons can be effectively used by incorporating them into IrisStore, a distributed read-write storage layer that provides high availability. Our results using IrisStore on the PlanetLab over an 8-month period demonstrate its ability to withstand large correlated failures and meet preconfigured availability targets.
  • View the full text of this paper in HTML and PDF. Listen to the presentation in MP3 format.
    Click here if you have forgotten your password Until May 2007, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2006 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.

  • If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
To become a USENIX Member, please see our Membership Information.

Last changed: 1 June 2006 ch