• Donate
  • Log In
Home
  • About
    • About
      • About Us
      • Our Board of Directors
      • Board Meeting Minutes
      • Board Elections
      • Updates & Announcements
      • Our Staff
      • Governance & Financials
      • Lifetime Achievement Award
  • Events
    • Events
      • Upcoming
      • Past
      • Conference FAQ
      • Conference Policies
      • Code of Conduct
      • Calls for Papers
      • Author Resources
      • Grant Opportunities
      • Best Papers
      • Test of Time Awards
  • Join & Support
    • Join & Support
      • Become a Member
      • Ways to Give
      • Our Supporters
      • Student Opportunities
      • Sponsorship Opportunities
  • Archive
    • Archive
      • Proceedings
      • Multimedia
      • ;login: Archive
      • Short Topics in System Administration Series
      • Journal of Education in System Administration (JESA)
      • Journal of Election Technology and Systems (JETS)
      • Computing Systems Journal
  • Search

Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems

Author(s): 

Haryadi S. Gunawi, Riza O. Suminto, Russell Sears, Swaminathan Sundararaman, Xing Lin, and Robert Ricci

Understanding fault models is an important criterion for building robust systems. Decades of research have developed mature failure models such as fail­stop, fail­partial, fail­transient, and Byzantine failures. We highlight an under­studied “new” failure type: fail­slow hardware, i.e., hardware that is still running and functional but in a degraded mode, i.e., slower than its expected performance. We found that all major hardware components can exhibit fail­slow faults. For example, disk throughput can drop by three orders of magnitude to 100 KB/s due to vibration; CPUs can unexpectedly run at half­speed due to lack of power; and network card performance can collapse to Kbps level due to buffer corrup­ tion and retransmission.

Download Article: 
PDF icon Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems (PDF)
Article Section: 
STORAGE
;login: issue: 
Summer 2018, Vol. 43, No. 2
USENIX logo
  • Contact USENIX
  • Privacy Policy

© USENIX 2025
EIN 13-3055038

Website designed and built by Giant Rabbit LLC
Powered by Backdrop CMS

We need contributions from individuals like you.

USENIX conferences directly influence the development of computing systems and products used worldwide. Contribute today to support this vital work for the next 50 years.

Secure the Future of USENIX

Donate
Close