ScaleCheck: A Single-Machine Approach for Discovering Scalability Bugs in Large Distributed Systems

Authors: 

Cesar A. Stuardo, University of Chicago; Tanakorn Leesatapornwongsa, Samsung Research America; Riza O. Suminto, Huan Ke, and Jeffrey F. Lukman, University of Chicago; Wei-Chiu Chuang, Cloudera; Shan Lu and Haryadi S. Gunawi, University of Chicago

Abstract: 

We present ScaleCheck, an approach for discovering scalability bugs (a new class of bug in large storage systems) and for democratizing large-scale testing. ScaleCheck employs a program analysis technique, for finding potential causes of scalability bugs, and a series of colocation techniques, for testing implementation code at real scales but doing so on just a commodity PC. ScaleCheck has been integrated to several large-scale storage systems, Cassandra, HDFS, Riak, and Voldemort, and successfully exposed known and unknown scalability bugs, up to 512-node scale on a 16-core PC.

FAST '19 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {227814,
author = {Cesar A. Stuardo and Tanakorn Leesatapornwongsa and Riza O. Suminto and Huan Ke and Jeffrey F. Lukman and Wei-Chiu Chuang and Shan Lu and Haryadi S. Gunawi},
title = {{ScaleCheck}: A {Single-Machine} Approach for Discovering Scalability Bugs in Large Distributed Systems},
booktitle = {17th USENIX Conference on File and Storage Technologies (FAST 19)},
year = {2019},
isbn = {978-1-939133-09-0},
address = {Boston, MA},
pages = {359--373},
url = {https://www.usenix.org/conference/fast19/presentation/stuardo},
publisher = {USENIX Association},
month = feb
}

Presentation Video