Observability in the MLOps Lifecycle with Prometheus

Wednesday, June 14, 2023 - 12:10 pm12:35 pm

Shivay Lamba


MLOps is widely talked about and used to make the practice of deploying, managing, and monitoring ML models in production easier. Monitoring ML training or evaluation jobs is obviously very important however it is more important to monitor once an ML model is deployed.

This talk first starts by giving a gentle introduction about how ML deployments should be monitored, briefly talking about edge cases in production, data drift, concept drift, model metrics as well as the standard system and resource metrics. We give the audience an overview of observability and monitoring in the context of MLOps. This monitoring could also provide valuable results in terms of whether a model should be retrained, if more data should be collected, if different kinds of data should be collected, and more.

We show how one can handle the very important task of monitoring and performing the aforementioned tasks in the context of MLOps with Prometheus. We also show how one could take their existing deployments and add the power of easy and useful monitoring with Prometheus. Finally, we also show demos about how one could use Prometheus paired with their Flyte or Seldon Core, or FastAPI ML deployments.

Shivay Lamba

Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and has also been a MLH Fellow. He is actively involved in community work as well. He is a TensorflowJS SIG member, Mentor in OpenMined and CNCF Service Mesh Community, SODA Foundation and has given talks at various conferences like Github Satellite, Voice Global, Fossasia Tech Summit, TensorflowJS Show & Tell.

Presentation Video