Dark Sky Camping: Reducing Alert Pollution with Modern Observability Practices

Tuesday, March 15, 2022 - 4:45 pm5:30 pm

Kristin Smith, Campspot

Abstract: 

Over the course of the pandemic, several factors converged to create an amazing problem at Campspot: more traffic! Increased load stressed our applications and unpleasant customer-facing incidents stressed our engineering teams. In response, we doubled down on existing tools and processes: increased alerting, beefed up on-call rotations, more dashboards, and more high-urgency Slack channels. We put spotlights on so many areas of the system it became hard to see where issues were.

Recognizing the chaos, we pivoted in Spring 2021 to unify teams around a single observability tool and implemented Service Level Objectives. The result: fewer alerts, faster troubleshooting, and clearer indicators of when to focus on performance vs. features. Come hear how we cleared out the alert pollution so we could see the constellations we were actually searching for all along. If you're building a case for the move to observability, this talk is for you.

Kristin Smith, Campspot

Kristin Smith (she/her) serves as a DevOps Services Team Lead for a distributed team of cloud and data engineers at Campspot. She transitioned into the technical industry seven years ago, bringing with her a background in history and archival sciences. Along the way she has worked in technical organizations ranging from three people to over 700, in both the private and public spheres. Her professional interests include infrastructure provisioning, monitoring and traceability in distributed systems, and writing documentation that people actually read.

SREcon22 Americas Open Access Sponsored by Blameless

BibTeX
@conference {278134,
author = {Kristin Smith},
title = {Dark Sky Camping: Reducing Alert Pollution with Modern Observability Practices},
year = {2022},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}

Presentation Video