From Black Box to a Known Quantity: How to Build Predictable, Reliable ML-based Services

Wednesday, March 27, 2019 - 3:30 pm4:00 pm

Salim Virji and Carlos Villavieja, Google LLC


Artificial intelligence is all around us, from the digitals assistants in our microwaves to the apps we rely on every day. Many of these systems build on APIs and services that use machine learning to provide key features. This talk will describe techniques for building predictable, reliable ML-based services as well as ways to sustain these services through social and technical change. We discuss challenges unique to the reliability of these systems and relate our experiences with ML in our production systems to illustrate our techniques.

Salim Virji, Google LLC

Salim Virji is a Site Reliability Engineer at Google, where he has worked on distributed compute, consensus, and storage systems.

Carlos Villavieja, Google LLC

Carlos Villavieja is a Computer Architect/Researcher working as a Software/Site Reliability Engineer at Google. He works on Storage optimizations and his interests vary from micro-architecture to machine learning.

SREcon19 Americas Open Access Videos Sponsored by

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {229585,
author = {Salim Virji and Carlos Villavieja},
title = {From Black Box to a Known Quantity: How to Build Predictable, Reliable {ML-based} Services},
year = {2019},
address = {Brooklyn, NY},
publisher = {USENIX Association},
month = mar,

Presentation Video