Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure

Authors: 

Qifan Pu, UC Berkeley; Shivaram Venkataraman, University of Wisconsin, Madison; Ion Stoica, UC Berkeley

Abstract: 

Serverless computing is poised to fulfill the long-held promise of transparent elasticity and millisecond-level pricing. To achieve this goal, service providers impose a finegrained computational model where every function has a maximum duration, a fixed amount of memory and no persistent local storage. We observe that the fine-grained elasticity of serverless is key to achieve high utilization for general computations such as analytics workloads, but that resource limits make it challenging to implement such applications as they need to move large amounts of data between functions that don’t overlap in time. In this paper, we present Locus, a serverless analytics system that judiciously combines (1) cheap but slow storage with (2) fast but expensive storage, to achieve good performance while remaining cost-efficient. Locus applies a performance model to guide users in selecting the type and the amount of storage to achieve the desired cost-performance trade-off. We evaluate Locus on a number of analytics applications including TPC-DS, CloudSort, Big Data Benchmark and show that Locus can navigate the cost-performance trade-off, leading to 4×-500× performance improvements over slow storage-only baseline and reducing resource usage by up to 59% while achieving comparable performance on a cluster of virtual machines, and within 1.99× slower compared to Redshift.

NSDI '19 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {227653,
author = {Qifan Pu and Shivaram Venkataraman and Ion Stoica},
title = {Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure},
booktitle = {16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19)},
year = {2019},
isbn = {978-1-931971-49-2},
address = {Boston, MA},
pages = {193--206},
url = {https://www.usenix.org/conference/nsdi19/presentation/pu},
publisher = {{USENIX} Association},
}