Continuous Training for Production {ML} in the {TensorFlow} Extended ({{{{{TFX}}}}}) Platform

Denis Baylor; Kevin Haas; Konstantinos Katsiapis; Sammy Leong; Rose Liu; Clemens Menwald; Hui Miao; Neoklis Polyzotis; Mitchell Trott; Martin Zinkevich

Authors:

Denis Baylor, Kevin Haas, Konstantinos Katsiapis, Sammy Leong, Rose Liu, Clemens Menwald, Hui Miao, Neoklis Polyzotis, Mitchell Trott, and Martin Zinkevich, Google Research

Abstract:

Large organizations rely increasingly on continuous ML pipelines in order to keep machine-learned models continuously up-to-date with respect to data. In this scenario, disruptions in the pipeline can increase model staleness and thus degrade the quality of downstream services supported by these models. In this paper we describe the operation of continuous pipelines in the Tensorflow Extended (TFX) platform that we developed and deployed at Google. We present the main mechanisms in TFX to support this type of pipelines in production and the lessons learned from the deployment of the platform internally at Google.

Denis Baylor, Google Research

Kevin Haas, Google Research

Konstantinos Katsiapis, Google Research

Sammy Leong, Google Research

Rose Liu, Google Research

Clemens Menwald, Google Research

Hui Miao, Google Research

Neoklis Polyzotis, Google Research

Mitchell Trott, Google Research

Martin Zinkevich, Google Research

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {232989,
author = {Denis Baylor and Kevin Haas and Konstantinos Katsiapis and Sammy Leong and Rose Liu and Clemens Menwald and Hui Miao and Neoklis Polyzotis and Mitchell Trott and Martin Zinkevich},
title = {Continuous Training for Production {ML} in the {TensorFlow} Extended ({{{{{TFX}}}}}) Platform},
booktitle = {2019 USENIX Conference on Operational Machine Learning (OpML 19)},
year = {2019},
isbn = {978-1-939133-00-7},
address = {Santa Clara, CA},
pages = {51--53},
url = {https://www.usenix.org/conference/opml19/presentation/baylor},
publisher = {USENIX Association},
month = may
}

Download

Baylor PDF

Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform

Open Access Media