Neutrino: Revisiting Memory Caching for Iterative Data Analytics

Erci Xu; Mohit Saxena; Lawrence Chiu

help promote

HotCloud '16 button

USENIX Conference Policies

Neutrino: Revisiting Memory Caching for Iterative Data Analytics

Erci Xu, The Ohio State University; Mohit Saxena and Lawrence Chiu, IBM Almaden Research Center

In-memory analytics frameworks such as Apache Spark are rapidly gaining popularity as they provide order of magnitude performance speedup over disk-based systems for iterative workloads. For example, Spark uses the Resilient Distributed Dataset (RDD) abstraction to cache data in memory and iteratively compute on it in a distributed cluster.

In this paper, we make the case that existing abtractions such as RDD are coarse-grained and only allow discrete cache levels to be used for caching data. This results in inefficient memory utilization and lower than optimal performance. In addition, relying on the programmer to enforce caching decisions for an RDD makes it infeasible for the system to adapt to runtime changes. To overcome these challenges, we propose Neutrino that employs fine-grained memory caching of RDD partitions and adapts to the use of different in-memory cache levels based on runtime characteristics of the cluster. First, it extracts a data flow graph to capture the data access dependencies between RDDs across different stages of a Spark application without relying on cache enforcement decisions from the programmer. Second, it uses a dynamic-programming based algorithm to guide caching decisions across the cluster and adaptively convert or discard the RDD partitions from the different cache levels.

We have implemented a prototype of Neutrino as an extension to Spark and use four different machine-learning workloads for performance evaluation. Neutrino improves the average job execution time by up to 70% over the use of Spark’s native memory cache levels.

Erci Xu, The Ohio State University

Mohit Saxena, IBM Almaden Research Center

Lawrence Chiu, IBM Almaden Research Center

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {196376,
author = {Erci Xu and Mohit Saxena and Lawrence Chiu},
title = {Neutrino: Revisiting Memory Caching for Iterative Data Analytics},
booktitle = {8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16)},
year = {2016},
address = {Denver, CO},
url = {https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/xu},
publisher = {USENIX Association},
month = jun
}

Download

Xu PDF

View the slides

help promote

USENIX Conference Policies

Neutrino: Revisiting Memory Caching for Iterative Data Analytics

Erci Xu, The Ohio State University

Mohit Saxena, IBM Almaden Research Center

Lawrence Chiu, IBM Almaden Research Center

Open Access Media

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

sponsors

help promote

USENIX Conference Policies

Neutrino: Revisiting Memory Caching for Iterative Data Analytics

Erci Xu, The Ohio State University

Mohit Saxena, IBM Almaden Research Center

Lawrence Chiu, IBM Almaden Research Center

Open Access Media

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners