Learning in SRE

Wednesday, March 25, 2026 - 1:55 pm3:30 pm

John Allspaw, Adaptive Capacity Labs, and Colette Alexander, Resilience in Software Foundation

When there is 99.95% availability with a service, there’s a tendency to focus almost exclusively on the 0.05% that is keeping us from the promised land: 100%. But have you ever wondered what makes the 99.95% happen? You know, the “non-incident” time?

The one thing that makes non-incidents happen is learning. People learn in different ways, at different times, and asynchronously. Come and talk with us about the most critical — and yet invisible — thing we do everyday: learning.

John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John’s publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.” His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement. John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund University.

Colette has been working as an engineering leader in the software industry for 10+ years. Her obsession with learning from incidents and Resilience Engineering began while managing teams at Spotify. It eventually led her to pursue her Masters in Science at Lund University in Human Factors and Systems Safety. She has led organizations in SRE and observability at HashiCorp and Cognite. She also maintains an active composition and recording career as a rock cellist, and lives with her rescue dog, 2 kids and husband in Ann Arbor, Michigan.

BibTeX
@conference {317453,
author = {John Allspaw and Colette Alexander},
title = {Learning in {SRE}},
year = {2026},
address = {Seattle, WA},
publisher = {USENIX Association},
month = mar
}