Resolving Outages Faster with Better Debugging Strategies

Thursday, March 29, 2018 - 9:15 am9:55 am

Liz Fong-Jones and Adam Mckaig, Google

Abstract: 

Engineers spend a lot of time building dashboards to improve monitoring but still spend a lot of time trying to figure out what’s going on and how to fix it when they get paged. Building more dashboards isn’t the solution, using dynamic query evaluation and integrating tracing is.

Liz Fong-Jones, Google

Liz Fong-Jones is a Staff Site Reliability Engineer at Google and works on the Google Cloud Customer Reliability Engineering team in New York. She lives with her wife, metamour, and two Samoyeds in Brooklyn. In her spare time, she plays classical piano, leads an EVE Online alliance, and advocates for transgender rights.

Adam Mckaig, Google

Adam Mckaig is an SRE at Google in New York, where he looks after a monitoring system. Previously he built things at the New York Times, Bloomberg, and UNICEF. He enjoys C++, which probably says it all.

SREcon18 Americas Open Access Videos Sponsored by
Indeed

BibTeX
@conference {213096,
author = {Liz Fong-Jones and Adam Mckaig},
title = {Resolving Outages Faster with Better Debugging Strategies},
year = {2018},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
}