Sue Lueder and Betsy Beyer, Google
In the 2016 O'Reilly book Site Reliability Engineering, Google described our culture of blameless postmortems, and recommended that organizations institute a similar culture of postmortems after production incidents. This talk shares some best practices and challenges in designing an appropriate action item plan and subsequently executing that plan in a complex environment of competing priorities, resource limitations, and operational realities. We discuss best practices for developing high-quality action items (AIs) for a postmortem, plus methods of ensuring these AIs actually get implemented so that we dont suffer the exact same outage or even worse again. It's worth noting that Google teams are by no means perfect at formulating and executing postmortem action items. We still have a lot to learn in this difficult area, and are sharing our thoughts and strategies to give a starting point for discussion throughout the industry.
Sue Lueder joined Google as a Site Reliability Program Manager in 2014 and is on the team responsible for disaster testing and readiness, incident management processes and tools, and incident analysis. Previous to Google, Sue was a technical program manager and a systems, software, and quality engineer in wireless and smart energy industries (Ingenu Wireless, Texas Instruments, Qualcomm). She has a M.S. in Organization Development from Pepperdine University and a B.S in Physics from UCSD.
Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC. She has previously written documentation for Google Datacenters and Hardware Operations teams. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University. She holds degrees from Stanford and Tulane.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.