Jordan Li and Ivan Ryabov, Goldman Sachs
SLI/SLO is the fundamental building block for any SRE organization. It creates a language to communicate service reliability. However, apply it in practice and at scale poses some interesting challenges:
- Challenges for Developers
- How to move from telemetry to user-centric metrics
- How to implement monitoring strategy to measure SLI for different type of services
- How to define SLO and the math behind it
- Challenges for Stakeholders
- How to interpret and evaluate SLO
At Goldman Sachs, we tackle this problem with tooling, "SLO Repository", a tool we built with open-source technologies for driving SLO adoption, making SLO discoverable to drive on-going feedback loop and easy to understand for stakeholders.
This talk will convey Goldman Sachs' key lessons learned while driving SLO adoption across organization and showcasing the ecosystem we built around SLO adoption.
Jordan Li, Goldman Sachs
Jordan Li is a Software Engineer and Site Reliability Engineer at Goldman Sachs, focuses on building tool for observability and SLO adoption. Prior to that, he was a network engineer and cloud engineer at HKT. Besides his day job, you can find him doing card tricks.
Ivan Ryabov, Goldman Sachs
Ivan Ryabov, Software engineer turned SRE.
author = {Jordan Li and Ivan Ryabov},
title = {Our Experience Tracking and Driving {SLO} Adoption at Goldman Sachs},
year = {2022},
address = {Sydney},
publisher = {USENIX Association},
month = dec
}