Whiz: Data-Driven Analytics Execution

Authors: 

Robert Grandl, Google; Arjun Singhvi, University of Wisconsin–Madison; Raajay Viswanathan, Uber Technologies Inc.; Aditya Akella, University of Wisconsin–Madison

Abstract: 

Today's data analytics frameworks are compute-centric, with analytics execution almost entirely dependent on the predetermined physical structure of the high-level computation. Relegating intermediate data to a second class entity in this manner hurts flexibility, performance, and efficiency. We present Whiz, a new analytics execution framework that cleanly separates computation from intermediate data. This enables runtime visibility into intermediate data via programmable monitoring, and data-driven computation where data properties drive when/what computation runs. Experiments with a Whiz prototype on a 50-node cluster using batch, streaming, and graph analytics workloads show that it improves analytics completion times 1.3-2x and cluster efficiency 1.4x compared to state-of-the-art.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {262040,
title = {Whiz: Data-Driven Analytics Execution},
booktitle = {18th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 21)},
year = {2021},
url = {https://www.usenix.org/conference/nsdi21/presentation/grandl},
publisher = {{USENIX} Association},
month = apr,
}
Grandl Paper (Prepublication) PDF