Going from 30 to 30 Million SLOs

Wednesday, 26 October, 2022 - 14:4515:30 CEST

Alex Palcuie, Google

Abstract: 

I will be presenting the evolution of Service Level Objectives (SLO) for the GCE Compute API for the past 6 years. Starting from the initial 30 or so SLOs, going through a mid-term phase of about a thousand and ending with millions of per-customer SLOs. I will be sharing anecdotes, better techniques on how to handle low-QPS (think continuous over discrete metrics) and how to aggregate the data for better leadership visibility.

Alex Palcuie, Google

Alex has been working as a Site Reliability Engineer in the team that takes care of the GCE Compute API for over 5 years. He’s also been part of the team that built a control plane framework that’s now powering over 20 products in Google Cloud. His current 20% project is helping with huge outages in the Tech Incident Response Team (Tech IRT), like powering down computers in a data centre when the weather is too hot.

BibTeX
@conference {284627,
author = {Alex Palcuie},
title = {Going from 30 to 30 Million {SLOs}},
year = {2022},
address = {Amsterdam},
publisher = {USENIX Association},
month = oct
}

Presentation Video