How Did Things Go Right? Learning More from Incidents

Monday, March 25, 2019 - 1:40 pm–2:10 pm

Ryan Kitchens, Netflix


Solely learning from failure isn't a fundamental—it's a limitation.

A look into the New View of Safety, Human & Organizational Performance, and Resilience Engineering shows us that safety, great performance, and sources of resilience do not come from the absence of failure but rather the presence of adaptive capacity.

Navigating a perfect storm in a world where availability is made up and the 9's don't matter requires expertise. This talk will describe more rewarding ways to approach incident investigation without overly focusing on failure prevention.

  • What's going on when it seems like nothing is happening?
  • When failure does occur, what's going to keep it from being worse?
  • How do teams adapt successfully when preventative techniques fail?
  • How should we prioritize the effort to develop systems that help us safely manage the consequences of failure?

These questions cannot be answered by trying to explain the causes of failure and fixing remediation items.

We will move the needle forward and increase our opportunity for learning from success with some fundamental and practical ways that get us from, "Why did things go wrong?" to "How did things go right?"

Ryan Kitchens, Netflix

Ryan Kitchens is a Site Reliability Engineer on the Core team at Netflix where he works on building capacity across the organization to ensure its availability and reliability. Before that, Ryan was a founding member of the SRE team at Blizzard Entertainment.

SREcon19 Americas Open Access Videos Sponsored by

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {229531,
author = {Ryan Kitchens},
title = {How Did Things Go Right? Learning More from Incidents},
year = {2019},
address = {Brooklyn, NY},
publisher = {USENIX Association},
month = mar,

Presentation Video