Observing from Incidents

Monday, December 07, 2020 - 1:50 pm2:30 pm

Cory Watson

Abstract: 

Despite thousands of squawking alerts and a morass of dashboards our complex systems remain firmly mysterious. Incidents continue to pop up in places that, frankly, they should not. In this talk, we'll leverage techniques from dozens of companies to learn successes and failures, how to spread that hard-earned knowledge via observability and visualizations, and how to productize the process internally to drive down incident impact, improve customer experience, and reduce stress.

Cory Watson[node:field-speakers-institution]

Cory Watson is an engineer at Stripe, leading high impact, customer-focused projects around reliability. Cory started his journey of observability as an SRE at Twitter, founded the observability team at Stripe, and spent time at vendors SignalFx and Splunk. He is a strong voice in the observability community, through OSS, popular tweets, blog posts, and speaking engagements.

Cory has over 20 years of software engineering experience and is an active founder/contributor of several successful Open Source projects. Before finding his passion for reliability, he worked in several industries such as e-commerce, consulting, healthcare, and fintech.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {262265,
author = {Cory Watson},
title = {Observing from Incidents},
booktitle = {SREcon20 Americas (SREcon20 Americas)},
year = {2020},
url = {https://www.usenix.org/conference/srecon20americas/presentation/watson},
publisher = {{USENIX} Association},
month = dec,
}

Presentation Video