Deploying SRE Training Best Practices to Production: What We Learned (a.k.a. Strapping Jetpacks on Unicorns, the Postmortem)

Friday, 31 August, 2018 - 14:5015:30

Jennifer Petoff, Google Ireland


In 2015, Andrew Widdowson gave a talk at SREcon Americas titled “From Zero to Hero: Recommended Practices for Training your Ever-Evolving SRE Teams.” His recommendations were based on nearly a decade of personal experience ramping up new SREs at Google.

Fast forward to 2018. Google SRE now has a global training organization called SRE EDU. In many ways, SRE EDU was charged with developing a formal program to deploy these training best practices into production. Our goal? Spin up a globally consistent and reliable education program for Site Reliability Engineering.

Of course a cornerstone of SRE practice is the blameless postmortem. This talk addresses what we learned when scaling training best practices globally. Along the way, we’ll share tips for small and large organizations alike on how you can learn from our experience and ensure that you deliver an effective training experience for your SREs.

Jennifer Petoff is a Senior Program Manager for Google's Site Reliability Engineering team based in Dublin, Ireland and is one of the co-editors of the best-selling book, Site Reliability Engineering: How Google Runs Production Systems. Jennifer currently co-leads the global SRE EDU training program at Google.

