Is Big Data Performance Reproducible in Modern Cloud Networks?

Authors: 

Alexandru Uta and Alexandru Custura, Vrije Universiteit Amsterdam; Dmitry Duplyakin, University of Utah; Ivo Jimenez, UC Santa Cruz; Jan Rellermeyer, TU Delft; Carlos Maltzahn, UC Santa Cruz; Robert Ricci, University of Utah; Alexandru Iosup, Vrije Universiteit Amsterdam

Abstract: 

Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.

NSDI '20 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {246352,
author = {Alexandru Uta and Alexandru Custura and Dmitry Duplyakin and Ivo Jimenez and Jan Rellermeyer and Carlos Maltzahn and Robert Ricci and Alexandru Iosup},
title = {Is Big Data Performance Reproducible in Modern Cloud Networks? },
booktitle = {17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)},
year = {2020},
isbn = {978-1-939133-13-7},
address = {Santa Clara, CA},
pages = {513--527},
url = {https://www.usenix.org/conference/nsdi20/presentation/uta},
publisher = {USENIX Association},
month = feb
}

Presentation Video