Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform


Denis Baylor, Kevin Haas, Konstantinos Katsiapis, Sammy Leong, Rose Liu, Clemens Menwald, Hui Miao, Neoklis Polyzotis, Mitchell Trott, and Martin Zinkevich, Google Research


Large organizations rely increasingly on continuous ML pipelines in order to keep machine-learned models continuously up-to-date with respect to data. In this scenario, disruptions in the pipeline can increase model staleness and thus degrade the quality of downstream services supported by these models. In this paper we describe the operation of continuous pipelines in the Tensorflow Extended (TFX) platform that we developed and deployed at Google. We present the main mechanisms in TFX to support this type of pipelines in production and the lessons learned from the deployment of the platform internally at Google.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {232989,
author = {Denis Baylor and Kevin Haas and Konstantinos Katsiapis and Sammy Leong and Rose Liu and Clemens Menwald and Hui Miao and Neoklis Polyzotis and Mitchell Trott and Martin Zinkevich},
title = {Continuous Training for Production {ML} in the {TensorFlow} Extended ({{{{{TFX}}}}}) Platform},
booktitle = {2019 USENIX Conference on Operational Machine Learning (OpML 19)},
year = {2019},
isbn = {978-1-939133-00-7},
address = {Santa Clara, CA},
pages = {51--53},
url = {},
publisher = {USENIX Association},
month = may