Managing Misfortune for Best Results

Thursday, 2018, August 30 - 14:0014:45

Kieran Barry, SRE @ Google

Abstract: 

The Simulated Outage training game is a regular part of SRE training at Google and elsewhere. They represent great opportunities to simulate an outage, and to practice problem debugging and escalation. Perhaps equally important, they provide an opportunity to simulate the stress of an outage for an oncall engineer.

This talk describes techniques to ensure a productive training environment. It will emphasise the importance of providing context to the trainee engineer. It will also talk about the importance of calibrating the level of stress to the needs of the student. Since training games are often observed by whole teams, the talk will cover ways to maintain engagement among the group of observers.

Finally, it will talk about potential anti-patterns to be avoided.

Kieran Barry, SRE @ Google

Kieran has worked at Google as an SRE on the search team for the past four years. He has also volunteered on new-SRE education as part of the SREEDU team.

SREcon18 Europe/Middle East/Africa Open Access Videos
Sponsored by Indeed

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Audio

BibTeX
@inproceedings {218851,
author = {Kieran Barry},
title = {Managing Misfortune for Best Results},
booktitle = {SREcon18 Europe/Middle East/Africa (SREcon18 Europe)},
year = {2018},
address = {Dusseldorf},
url = {https://www.usenix.org/node/218852},
publisher = {{USENIX} Association},
}