SRE 101, Revisited

Thursday, 31 August, 2017 - 13:3014:30

Laura Nolan, Google


This presentation replaces the talk by Dinah McNutt, who is unable to attend. Laura Nolan will revisit her SRE 101 content from yesterday; if you missed the session due to the meeting room being at capacity, this is your opportunity to attend.

The purpose of an SRE team is to keep its services up, reliable, performant and efficient. How do effective SRE teams do this?

We'll run through an overview of key SRE competencies: monitoring and alerting, incident response, disaster recovery, performance and efficiency, change management and capacity planning.

We'll also look at the habits of successful SRE teams and some common pitfalls.

Laura Nolan, Google

Laura Nolan has been a Site Reliability Engineer at Google for four years, working on large data infrastructure projects and most recently, networking. Her background is in software engineering and computer science. She wrote the 'Managing Critical State' chapter in the O'Reilly SRE book, and is co-chair of SREcon17 Europe/Middle East/Africa.

@conference {205532,
author = {Laura Nolan},
title = {{SRE} 101, Revisited},
year = {2017},
address = {Dublin},
publisher = {USENIX Association},
month = aug