On the diversity of cluster workloads and its impact on research results

Authors: 

George Amvrosiadis, Jun Woo Park, Gregory R. Ganger, and Garth A. Gibson, Carnegie Mellon University; Elisabeth Baseman and Nathan DeBardeleben, Los Alamos National Laboratory

Abstract: 

Six years ago, Google released an invaluable set of scheduler logs which has already been used in more than 450 publications. We find that the scarcity of other data sources, however, is leading researchers to overfit their work to Google's dataset characteristics. We demonstrate this overfitting by introducing four new traces from two private and two High Performance Computing (HPC) clusters. Our analysis shows that the private cluster workloads, consisting of data analytics jobs expected to be more closely related to the Google workload, display more similarity to the HPC cluster workloads. This observation suggests that additional traces should be considered when evaluating the generality of new research.

To aid the community in moving forward, we release the four analyzed traces, including: the longest publicly available trace spanning all 61 months of an HPC cluster's lifetime and a trace from a 300,000-core HPC cluster, the largest cluster with a publicly available trace. We present an analysis of the private and HPC cluster traces that spans job characteristics, workload heterogeneity, resource utilization, and failure rates. We contrast our findings with the Google trace characteristics and identify affected work in the literature. Finally, we demonstrate the importance of dataset plurality and diversity by evaluating the performance of a job runtime predictor using all four of our traces and the Google trace.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {215941,
author = {George Amvrosiadis and Jun Woo Park and Gregory R. Ganger and Garth A. Gibson and Elisabeth Baseman and Nathan DeBardeleben},
title = {On the diversity of cluster workloads and its impact on research results},
booktitle = {2018 USENIX Annual Technical Conference (USENIX ATC 18)},
year = {2018},
isbn = {978-1-939133-01-4},
address = {Boston, MA},
pages = {533--546},
url = {https://www.usenix.org/conference/atc18/presentation/amvrosiadis},
publisher = {USENIX Association},
month = jul
}

Presentation Audio