Model-Switching: Dealing with Fluctuating Workloads in Machine-Learning-as-a-Service Systems

Authors: 

Jeff Zhang, New York University; Sameh Elnikety, Microsoft Research; Shuayb Zarar and Atul Gupta, Microsoft; Siddharth Garg, New York University

Abstract: 

Machine learning (ML) based prediction models, and especially deep neural networks (DNNs) are increasingly being served in the cloud in order to provide fast and accurate inferences. However, existing service ML serving systems have trouble dealing with fluctuating workloads and either drop requests or significantly expand hardware resources in response to load spikes. In this paper, we introduce Model-Switching, a new approach to dealing with fluctuating workloads when serving DNN models. Motivated by the observation that end-users of ML primarily care about the accuracy of responses that are returned within the deadline (which we refer to as effective accuracy), we propose to switch from complex and highly accurate DNN models to simpler but less accurate models in the presence of load spikes. We show that the flexibility introduced by enabling online model switching provides higher effective accuracy in the presence of fluctuating workloads compared to serving using any single model. We implement Model-Switching within Clipper, a state-of-art DNN model serving system, and demonstrate its advantages over baseline approaches.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {254124,
author = {Jeff Zhang and Sameh Elnikety and Shuayb Zarar and Atul Gupta and Siddharth Garg},
title = {{Model-Switching}: Dealing with Fluctuating Workloads in {Machine-Learning-as-a-Service} Systems},
booktitle = {12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20)},
year = {2020},
url = {https://www.usenix.org/conference/hotcloud20/presentation/zhang},
publisher = {USENIX Association},
month = jul
}

Presentation Video