Unified Reporting of Service Reliability

Thursday, June 13, 2019 - 5:30 pm6:00 pm

Helen Zhang, Google


We built a unified reporting system to bring together data from different sources that lived in unconnected silos (such as SLO reporting metrics, postmortems, incident response tools, customer support tickets, etc.). The system ingests and correlates data from these different sources and stores the processed data in a new database. People from a variety of teams would use the data to create customized dashboards that suit their particular reporting needs.

Helen Zhang, Google

Helen Zhang is a staff software engineer at Google SRE. During her nine years with Google, she has worked with hundreds of developers across the company to launch mission-critical production services. She recently led a team to build a unified service reporting system for service reliability.

@conference {233331,
author = {Helen Zhang},
title = {Unified Reporting of Service Reliability},
year = {2019},
address = {Singapore},
publisher = {{USENIX} Association},