The On-Call Review: Building a Team Culture That Rejects Noise

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Thursday, March 26, 2020 - 11:30 am12:05 pm

Dan Slimmon, Hashicorp

Abstract: 

Alerts are only useful if you believe what they're saying. But over time, left unchecked, bogus alerts will make up more and more of a team's alert load. How can we prevent this proliferation of noise?

The alert review is a process I've been using successfully for over a decade, on lots of different teams. It's a way for a team to identify noisy alerts and, over time, develop healthier alerting habits. The process focuses on actionability (the ability of the recipient to act upon the problem an alert indicates) and investigability (the quality of requiring new insight rather than rote runbook-following to resolve the problem). Its benefits can be immense and long-lasting.

Dan Slimmon, Hashicorp

Dan Slimmon is an S.R.E. DevOpsAdmin™ at Hashicorp, a software company that makes lots of super useful tools for ops folks like you and me. He enjoys finding mathy and medicine-adjacent solutions to the problems of running busy web applications. He's got a lot of cat pics on his phone and would be delighted to show you some!

BibTeX
@conference {247302,
author = {Dan Slimmon},
title = {The On-Call Review: Building a Team Culture That Rejects Noise},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}