Providing SLOs for Resource-Harvesting VMs in Cloud Platforms

Authors: 

Pradeep Ambati, Íñigo Goiri, and Felipe Frujeri, Microsoft Research; Alper Gun, Ke Wang, Brian Dolan, Brian Corell, Sekhar Pasupuleti, and Thomas Moscibroda, Microsoft Azure; Sameh Elnikety, Microsoft Research; Marcus Fontoura, Microsoft Azure; Ricardo Bianchini, Microsoft Research

Abstract: 

Cloud providers rent the resources they do not allocate as evictable virtual machines (VMs), like spot instances. In this paper, we first characterize the unallocated resources in Microsoft Azure, and show that they are plenty but may vary widely over time and across servers. Based on the characterization, we propose a new class of VM, called Harvest VM, to harvest and monetize the unallocated resources. A Harvest VM is more flexible and efficient than a spot instance, because it grows and shrinks according to the amount of unallocated resources at its underlying server; it is only evicted/killed when the provider needs its minimum set of resources. Next, we create models that predict the availability of the unallocated resources for Harvest VM deployments. Based on these predictions, we provide Service Level Objectives (SLOs) for the survival rate (e.g., 65% of the Harvest VMs will survive more than a week) and the average number of cores that can be harvested. Our short-term predictions have an average error under 2% and less than 6% for longer terms. We also extend a popular cluster scheduling framework to leverage the harvested resources. Using our SLOs and framework, we can offset the rare evictions with extra harvested cores and achieve the same computational power as regular-priority VMs, but at 91% lower cost. Finally, we outline lessons and results from running Harvest VMs and our framework in production.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {258967,
author = {Pradeep Ambati and Inigo Goiri and Felipe Frujeri and Alper Gun and Ke Wang and Brian Dolan and Brian Corell and Sekhar Pasupuleti and Thomas Moscibroda and Sameh Elnikety and Marcus Fontoura and Ricardo Bianchini},
title = {Providing SLOs for Resource-Harvesting VMs in Cloud Platforms},
booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},
year = {2020},
isbn = {978-1-939133-19-9},
pages = {735--751},
url = {https://www.usenix.org/conference/osdi20/presentation/ambati},
publisher = {{USENIX} Association},
month = nov,
}

Presentation Video