Building Scalable and Flexible Cluster Managers Using Declarative Programming

Authors: 

Lalith Suresh, VMware; João Loff, IST (ULisboa) / INESC-ID; Faria Kalim, UIUC; Sangeetha Abdu Jyothi, UC Irvine and VMware; Nina Narodytska, Leonid Ryzhyk, Sahan Gamage, Brian Oki, Pranshu Jain, and Michael Gasch, VMware

Abstract: 

Cluster managers like Kubernetes and OpenStack are notoriously hard to develop, given that they routinely grapple with hard combinatorial optimization problems like load balancing, placement, scheduling, and configuration. Today, cluster manager developers tackle these problems by developing system-specific best effort heuristics, which achieve scalability by significantly sacrificing the cluster manager's decision quality, feature set, and extensibility over time. This is proving untenable, as solutions for cluster management problems are routinely developed from scratch in the industry to solve largely similar problems across different settings.

We propose DCM, a radically different architecture where developers specify the cluster manager's behavior declaratively, using SQL queries over cluster state stored in a relational database. From the SQL specification, the DCM compiler synthesizes a program that, at runtime, can be invoked to compute policy-compliant cluster management decisions given the latest cluster state. Under the covers, the generated program efficiently encodes the cluster state as an optimization problem that can be solved using off-the-shelf solvers, freeing developers from having to design ad-hoc heuristics.

We show that DCM significantly lowers the barrier to building scalable and extensible cluster managers. We validate our claim by powering three production-grade systems with it: a Kubernetes scheduler, a virtual machine management solution, and a distributed transactional datastore.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {258844,
author = {Lalith Suresh and Jo{\~a}o Loff and Faria Kalim and Sangeetha Abdu Jyothi and Nina Narodytska and Leonid Ryzhyk and Sahan Gamage and Brian Oki and Pranshu Jain and Michael Gasch},
title = {Building Scalable and Flexible Cluster Managers Using Declarative Programming},
booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},
year = {2020},
isbn = {978-1-939133-19-9},
pages = {827--844},
url = {https://www.usenix.org/conference/osdi20/presentation/suresh},
publisher = {{USENIX} Association},
month = nov,
}
Artifact Evaluated

Presentation Video