Check out the new USENIX Web site.

USENIX Home . About USENIX . Events . membership . Publications . Students
OSDI '04 — Abstract

Pp. 61–76 of the Proceedings

Understanding and Dealing with Operator Mistakes in Internet Services

Kiran Nagaraja, Fábio Oliveira, Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen, Rutgers University

Abstract

Operator mistakes are a significant source of unavailability in modern Internet services. In this paper, we first characterize these mistakes by performing an extensive set of experiments using human operators and a realistic three-tier auction service. The mistakes we observed range from software misconfiguration, to fault misdiagnosis, to incorrect software restarts. We next propose to validate operator actions before they are made visible to the rest of the system. We demonstrate how to accomplish this task via the creation of a validation environment that is an extension of the online system, where components can be validated using real workloads before they are migrated into the running service. We show that our prototype validation system can detect 66% of the operator mistakes that we have observed.

  • View the full text of this paper in HTML and PDF.
    Click here if you have forgotten your password Until December 2005, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2004 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.

  • If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
To become a USENIX Member, please see our Membership Information.

 

?Need help? Use our Contacts page.

Last changed: 12 Oct. 2004 aw
Technical Program
OSDI '04 Home
USENIX home