Deploying SRE Training Best Practices to Production: How We SRE'ed Our SRE Education Program

Thursday, 2019, October 3 - 09:0009:45

Jennifer Petoff, Google Ireland, and JC van Winkel, Google Switzerland


Structured education is important for ramping up new SREs to build confidence and fight imposter syndrome. In this talk, we take a look behind the scenes of the SRE EDU Orientation curriculum at Google from a technical standpoint and organizational point of view while highlighting best practices that can be applied at organizations of all sizes. We’ll show how we applied SRE best practices to the program itself to minimize toil for the organizers (keyword: automation!) and keep the training software reliable and up to date.

By implementing judicious monitoring, we learned that hands-on exercises are a more successful way to ramp people up than one-way lectures. We built a rigged production system where an instructor can trigger outages that the students need to triage, mitigate and resolve. As the system is internal only, students cannot cause externally visible harm, creating a safe learning environment that allows for experimentation.

Jennifer Petoff is a Senior Program Manager for Google's Site Reliability Engineering team based in Dublin, Ireland. She is the global lead for Google’s SRE EDU program and is one of the co-editors of the best-selling book, Site Reliability Engineering: How Google Runs Production Systems.

JC has been teaching UNIX and programming languages since 1992, working for AT Computing, a small courseware spin-off of the University of Nijmegen, the Netherlands. JC joined Google's Site Reliability Engineering team in 2010 and is both a founding member and lead educator of the SRE education team, SRE EDU.

