Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Overview
  • Workshop Organizers
  • Technical Sessions
  • Hotel & Travel Information
  • Sponsors
  • For Participants
  • Call for Papers
  • Past Workshops

sponsors

Bronze Sponsor
Bronze Sponsor
Bronze Sponsor

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Failure Recovery: When the Cure Is Worse Than the Disease
Tweet

connect with us

http://www.twitter.com/usenix
https://www.facebook.com/usenixassociation
https://plus.google.com/108588319090208187909/posts
http://www.linkedin.com/groups?home=&gid=49559
http://www.youtube.com/user/USENIXAssociation

Failure Recovery: When the Cure Is Worse Than the Disease

Authors: 

Zhenyu Guo, Sean McDirmid, Mao Yang, and Li Zhuang, Microsoft Research Asia; Pu Zhang, Microsoft Research Asia and Peking University; Yingwei Luo, Peking University; Tom Bergan, Microsoft Research and University of Washington; Madan Musuvathi, Zheng Zhang, and Lidong Zhou, Microsoft Research Asia

Abstract: 

Cloud services inevitably fail: machines lose power, networks become disconnected, pesky software bugs cause sporadic crashes, and so on. Unfortunately, failure recovery itself is often faulty; e.g. recovery can accidentally recursively replicate small failures to other machines until the entire cloud service fails in a catastrophic outage, amplifying a small cold into a contagious deadly plague! We propose that failure recovery should be engineered foremost according to the maxim of primum non nocere, that it “does no harm.” Accordingly, we must consider the system holistically when failure occurs and recover only when observed activity safely allows for it.

Zhenyu Guo, Microsoft Research Asia

Sean McDirmid, Microsoft Research Asia

Mao Yang, Microsoft Research Asia

Li Zhuang, Microsoft Research Asia

Pu Zhang, Microsoft Research Asia and Peking University

Yingwei Luo, Peking University

Tom Bergan, Microsoft Research and University of Washington

Madan Musuvathi, Microsoft Research Asia

Zheng Zhang, Microsoft Research Asia

Lidong Zhou, Microsoft Research Asia

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Guo PDF
  • Log in or    Register to post comments

Bronze Sponsors

© USENIX

  • Privacy Policy
  • Contact Us