Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft


Minjia Zhang, Samyam Rajbandari, Wenhan Wang, Elton Zheng, Olatunji Ruwase, Jeff Rasley, Jason Li, Junhua Wang, and Yuxiong He, Microsoft AI and Research


The application of deep learning models presents significant improvement to many Microsoft services and products. In this paper, we introduce our experience and methodology of developing and applying the DeepCPU library for serving deep learning models in production at large scale with remarkable latency improvement and infrastructure cost reduction. We describe two ways to use the library, through customized optimization or framework integration, targeting different scenarios.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {232995,
author = {Minjia Zhang and Samyam Rajbandari and Wenhan Wang and Elton Zheng and Olatunji Ruwase and Jeff Rasley and Jason Li and Junhua Wang and Yuxiong He},
title = {Accelerating Large Scale Deep Learning Inference through {DeepCPU} at Microsoft},
booktitle = {2019 USENIX Conference on Operational Machine Learning (OpML 19)},
year = {2019},
isbn = {978-1-939133-00-7},
address = {Santa Clara, CA},
pages = {5--7},
url = {},
publisher = {USENIX Association},
month = may