What Breaks Our Systems: A Taxonomy of Black Swans

Tuesday, October 30, 2018 - 3:00 pm3:30 pm

Laura Nolan

Abstract: 

Black swan events: unforeseen, unanticipated, and catastrophic issues. These are the incidents that take our systems down, hard, and keep them down for a long time.

By definition, you cannot predict true black swans. But black swans often fall into categories that we’ve seen before.

This talk examines those categories, and how we can harden our systems against these categories of events, which include unforeseen hard capacity limits, cascading failures, queries of death, hidden system dependencies, various forms of deadlock, and more.

Laura Nolan, N/A

Laura Nolan’s background is in Site Reliability Engineering, software engineering, distributed systems and computer science. She wrote the 'Managing Critical State' chapter in the O'Reilly ‘Site Reliability Engineering’ book, and is co-chair of SREcon18 Europe/Middle East/Africa. Laura is currently enjoying a well-earned sabbatical (and tinkering with some of her own projects) after 15 years in industry, most recently at Google.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {221754,
author = {Laura Nolan},
title = {What Breaks Our Systems: A Taxonomy of Black Swans},
year = {2018},
address = {Nashville, TN},
publisher = {{USENIX} Association},
month = oct,
}

Presentation Video 

Presentation Audio