Tales from the VOID: The Scary Truth about Incident Metrics

Monday, March 14, 2022 - 1:50 pm2:30 pm

Courtney Nash, Verica

Abstract: 

This talk presents research collected from the VOID—a new open database of public incident reports. Containing nearly 2,000 reports for 660 organizations, the database allows for more structured review and research about software-related incident reporting. Key results from our research challenge standard industry practices for incident response and analysis, like tracking Mean Time To Resolve (MMTR) and using Root Cause Analysis (RCA) methodology. In particular, we demonstrate how unreliable MTTR can be, and how RCA can lead to environments where people are less likely to admit mistakes and speak up about things that could lead to future incidents. We propose alternate metrics (SLOs and cost of coordination data), practices (Near Miss analysis), and mindsets (humans are the solution, not the problem) to help organizations better learn from their incidents, and make their systems safer and more resilient.

Courtney Nash, Verica

Courtney Nash is a researcher focused on system safety and failures in complex sociotechnical systems. An erstwhile cognitive neuroscientist, she has always been fascinated by how people learn, and the ways memory influences how they solve problems. Over the past two decades, she’s held a variety of editorial, program management, research, and management roles at Holloway, Fastly, O’Reilly Media, Microsoft, and Amazon. She lives in the mountains where she skis, rides bikes, and herds dogs and kids.

SREcon22 Americas Open Access Sponsored by Blameless

BibTeX
@conference {278108,
author = {Courtney Nash},
title = {Tales from the {VOID}: The Scary Truth about Incident Metrics},
year = {2022},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}

Presentation Video