Cluster Serving: Distributed Model Inference using Big Data Streaming in Analytics Zoo

Jiaming Song, Dongjie Shi, Qiyuan Gong, Lei Xia, and Jason Dai, Intel


As deep learning projects evolve from experimentation to production, there is increasing demand to deploy deep learning models for large-scale, real-time distributed inference. While there are many tools available for relevant tasks (such as model optimization, serving, cluster scheduling, workflow management, etc.), it is still a challenging process for many deep learning engineers and scientists to develop and deploy distributed inference workflow that can scale out to large clusters in a transparent fashion.

To address this challenge, we have developed Cluster Serving, an automated and distributed serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL, and OpenVINO). It provides simple publish-subscribe (pub/sub) and REST APIs, through which users can easily send their inference requests to the input queue using simple Python or HTTP APIs. Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster, using distributed Big Data streaming frameworks (such as Apache Spark Streaming and Apache Flink).

In this talk, we will present the architecture design for Cluster Serving, and discuss the underlying design patterns and tradeoffs to deploy deep learning models on distributed Big Data streaming frameworks in production. In addition, we will also share real-world experience and "war stories" of users who have adopted Cluster Serving to develop and deploy distributed inference workflow.

Jiaming Song, Intel

Mr. Song Jiaming is a Machine Learning engineer at Intel, with over 2 years of experience in machine learning and big data. He is a key contributor to open source Big Data + AI project Analytics Zoo. He is now focusing on the development of Cluster Serving.

Jason Dai, Intel

Jason Dai is a senior principal engineer and CTO of Big Data Technologies at Intel, responsible for leading the global engineering teams (in both Silicon Valley and Shanghai) on the development of advanced data analytics and machine learning. He is the creator of BigDL and Analytics Zoo, a founding committer and PMC member of Apache Spark, and a mentor of Apache MXNet. For more details, please see

OpML '20 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {256660,
author = {Jiaming Song and Dongjie Shi and Qiyuan Gong and Lei Xia and Jason Dai},
title = {Cluster Serving: Distributed Model Inference using Big Data Streaming in Analytics Zoo},
year = {2020},
publisher = {{USENIX} Association},
month = jul,

Presentation Video