Firmament: Fast, Centralized Cluster Scheduling at Scale

Authors: 

Ionel Gog, University of Cambridge; Malte Schwarzkopf, MIT CSAIL; Adam Gleave and Robert N. M. Watson, University of Cambridge; Steven Hand, Google, Inc.

Abstract: 

Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization.

This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at sub-second placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations.

Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20x over Quincy, a prior centralized scheduler using the same MCMF optimization. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used centralized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6x.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {199390,
author = {Ionel Gog and Malte Schwarzkopf and Adam Gleave and Robert N. M. Watson and Steven Hand},
title = {Firmament: Fast, Centralized Cluster Scheduling at Scale},
booktitle = {12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)},
year = {2016},
isbn = {978-1-931971-33-1},
address = {Savannah, GA},
pages = {99--115},
url = {https://www.usenix.org/conference/osdi16/technical-sessions/presentation/gog},
publisher = {USENIX Association},
month = nov
}

Presentation Audio