Building Successful SRE in Large Enterprises—One Year Later

Wednesday, March 28, 2018 - 1:35 pm2:15 pm

Dave Rensin, Google


At SRECon2017 I talked about the formation of a special group of Google SREs who go into the world and teach enterprise customers—via actual production systems—how to "do SRE" in their orgs. It was new when I presented it. It's one year later and we have a lot of interesting data about how it's going. Some things that we thought would be hard, weren't. Others were nigh on impossible. We've written many postmortems and learned a bunch of lessons you can only learn the hard way.

Things you can expect to learn:

  • Why it's easier to bootstrap SRE in a large traditional enterprise than a cloud native!
  • Things enterprises assume are true, but aren't.
  • All the things we should have known better, but still learned the hard way—and how you can avoid them when bootstrapping SRE in your culture (or your customers' cultures)

Dave Rensin is a Google SRE Director leading Customer Reliability Engineering (CRE)—a team of SREs pointed outward at customer production systems. Previously, he led Global Support for Google Cloud. As a longtime startup veteran he has lived through an improbable number of "success disasters" and pathologically weird failure modes. Ask him how to secure a handheld computer by accidentally writing software to make it catch fire, why a potato chip can is a terrible companion on a North Sea oil derrick, or about the time he told Steve Jobs that the iPhone was "destined to fail."

