System Crash, Plane Crash: Lessons from Commercial Aviation and Other Engineering Fields

Friday, November 03, 2017 - 4:00 pm5:30 pm

Jon Kuroda, University of California, Berkeley


Commercial aviation, civil and structural engineering, emergency medicine, and the nuclear power industry all have hard-earned lessons gained over their respective histories, histories that stretch back decades or even centuries. Often acquired at a bloody cost, these experiences led to the development of environments typified by stringent regulation, strict test and design protocols, and demanding training and education requirements—all driven by a need to minimize loss of life.

In stark contrast, the computer industry in general and systems administration specifically have developed in a relatively unrestricted environment, largely free, outside of a few niche fields, from the regulation and external control seen in life-safety critical fields.

However, despite these major differences, these far more demanding environments still have many lessons to offer systems administrators and systems designers and engineers to apply to the design, development, and operation of computing systems.

We will look at incidents ranging from Air France 447 to Three Mile Island and what we can learn from the experiences of those involved both in the incidents and the subsequent investigations. We will draw parallels between our field as a whole and these other less forgiving fields in areas such as Education and Training, Monitoring, Design and Testing, Human Computer/Systems Interaction, Human Performance Factors, Organizational Culture, and Team Building.

We hope that you will take away not just a list of object lessons but also a new perspective and lens through which to view the work you do and the environment in which you do it.

Jon Kuroda, University of California, Berkeley

Jon is a sysadmin and research engineer at the Department of Electrical Engineering at the University of California, Berkeley, where he spends his days (and nights) puzzling over misbehaving Spark clusters, untangling network cable incompatibilities, debugging business process, trying to manage datacenter spaces, and still having a social life all while trying to keep up with dozens of computer science researchers. Three out of five isn't bad, right?

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {207201,
author = {Jon Kuroda},
title = {System Crash, Plane Crash: Lessons from Commercial Aviation and Other Engineering Fields},
year = {2017},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = oct

Presentation Video 

Presentation Audio