Managing Misfortune for Best Results

Thursday, 2018, August 30 - 14:0014:45

Kieran Barry, SRE @ Google


The Simulated Outage training game is a regular part of SRE training at Google and elsewhere. They represent great opportunities to simulate an outage, and to practice problem debugging and escalation. Perhaps equally important, they provide an opportunity to simulate the stress of an outage for an oncall engineer.

This talk describes techniques to ensure a productive training environment. It will emphasise the importance of providing context to the trainee engineer. It will also talk about the importance of calibrating the level of stress to the needs of the student. Since training games are often observed by whole teams, the talk will cover ways to maintain engagement among the group of observers.

Finally, it will talk about potential anti-patterns to be avoided.

Kieran Barry, SRE @ Google

Kieran has worked at Google as an SRE on the search team for the past four years. He has also volunteered on new-SRE education as part of the SREEDU team.

@inproceedings {218851,
author = {Kieran Barry},
title = {Managing Misfortune for Best Results},
booktitle = {SREcon18 Europe/Middle East/Africa (SREcon18 Europe)},
year = {2018},
address = {Dusseldorf},
url = {},
publisher = {{USENIX} Association},