Clipper: A Low-Latency Online Prediction Serving System


Daniel Crankshaw, Xin Wang, and Guilio Zhou, University of California, Berkeley; Michael J. Franklin, University of California, Berkeley, and The University of Chicago; Joseph E. Gonzalez and Ion Stoica, University of California, Berkeley


Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.

In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the Tensorflow Serving system and demonstrate that we are able to achieve comparable throughput and latency while enabling model composition and online learning to improve accuracy and render more robust predictions.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Video

Download Video

Presentation Audio

@inproceedings {201468,
author = {Daniel Crankshaw and Xin Wang and Guilio Zhou and Michael J. Franklin and Joseph E. Gonzalez and Ion Stoica},
title = {Clipper: A Low-Latency Online Prediction Serving System},
booktitle = {14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17)},
year = {2017},
isbn = {978-1-931971-37-9},
address = {Boston, MA},
pages = {613--627},
url = {},
publisher = {{USENIX} Association},