PerfIso: Performance Isolation for Commercial Latency-Sensitive Services

Authors: 

Călin Iorgulescu, EPFL; Reza Azimi, Brown University; Youngjin Kwon, U. Texas at Austin; Sameh Elnikety, Manoj Syamala, and Vivek Narasayya, Microsoft Research; Herodotos Herodotou, Cyprus University of Technology; Paulo Tomita, Alex Chen, Jack Zhang, and Junhua Wang, Microsoft Bing

Abstract: 

Large commercial latency-sensitive services, such as web search, run on dedicated clusters provisioned for peak load to ensure responsiveness and tolerate data center outages. As a result, the average load is far lower than the peak load used for provisioning, leading to resource under-utilization. The idle resources can be used to run batch jobs, completing useful work and reducing overall data center provisioning costs. However, this is challenging in practice due to the complexity and stringent tail-latency requirements of latency-sensitive services. Left unmanaged, the competition for machine resources can lead to severe response-time degradation and unmet service-level objectives (SLOs).

This work describes PerfIso, a performance isolation framework which has been used for nearly three years in Microsoft Bing, a major search engine, to colocate batch jobs with production latency-sensitive services on over 90,000 servers. We discuss the design and implementation of PerfIso, and conduct an experimental evaluation in a production environment. We show that colocating CPU-intensive jobs with latency-sensitive services increases average CPU utilization from 21% to 66% for off-peak load without impacting tail latency.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {215989,
author = {Calin Iorgulescu and Reza Azimi and Youngjin Kwon and Sameh Elnikety and Manoj Syamala and Vivek Narasayya and Herodotos Herodotou and Paulo Tomita and Alex Chen and Jack Zhang and Junhua Wang},
title = {PerfIso: Performance Isolation for Commercial Latency-Sensitive Services},
booktitle = {2018 {USENIX} Annual Technical Conference ({USENIX} {ATC} 18)},
year = {2018},
isbn = {978-1-939133-01-4},
address = {Boston, MA},
pages = {519--532},
url = {https://www.usenix.org/conference/atc18/presentation/iorgulescu},
publisher = {{USENIX} Association},
month = jul,
}

Presentation Audio