SRE and ML: Why It Matters

Thursday, 27 October, 2022 - 17:0017:45 CEST

Todd Underwood, Google


Machine Learning is an incredibly hyped set of technologies. It seems that ML is becoming an important part of distributed computing. I'll review whether SREs need to know anything about ML yet (probably you do—sorry!). And since ML reliability is challenging, I'll suggest some changes required for most SREs and even some significant changes to our profession. Finally, I'll review the state of using ML to automate production with an extremely skeptical eye.

Todd Underwood, Google

Todd Underwood is a Senior Director at Google and the founder of Google's Machine Learning SRE team, that supports many of Google's internal ML services as well as our Cloud AI products. He is also the Site Lead for Google’s Pittsburgh office in Pennsylvania, US. He is interested in how to make computers and people work much, much better together.

@conference {284671,
author = {Todd Underwood},
title = {{SRE} and {ML}: Why It Matters},
year = {2022},
address = {Amsterdam},
publisher = {USENIX Association},
month = oct

Presentation Video