Accelerating the Machine Learning Lifecycle with MLflow

Andrew Chen, DataBricks


ML development brings many new complexities beyond traditional software development. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.

In this session, we introduce MLflow, an open source ML platform started by Databricks in 2018 that is designed to integrate easily with arbitrary ML libraries, deployment tools, and workflows. MLflow introduces simple abstractions to package reproducible pipelines, track results, and encapsulate models that streamline sharing and productionizing ML. The project has a fast-growing open source community, with 80 contributors from over 40 companies, and integrations with Python, Java, R, and dozens of ML libraries and services. We show how to set up MLflow and execute various workflows in it based on best practices from current users.

@conference {233005,
title = {Accelerating the Machine Learning Lifecycle with MLflow},
year = {2019},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = may,