Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-located Events
  • Program
    • Workshop Program
  • Sponsorship
  • Participate
    • Instructions for Authors and Speakers
    • Call for Papers
  • About
    • Workshop Organizers
    • Help Promote
    • Questions
    • Past Workshops
  • Home
  • Attend
  • Program
  • Sponsorship
  • Participate
  • About

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

help promote

HotStorage '16 button

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Neutrino: Revisiting Memory Caching for Iterative Data Analytics
Tweet

connect with us

Neutrino: Revisiting Memory Caching for Iterative Data Analytics

Authors: 

Erci Xu, The Ohio State University; Mohit Saxena and Lawrence Chiu, IBM Almaden Research Center

Abstract: 

In-memory analytics frameworks such as Apache Spark are rapidly gaining popularity as they provide order of magnitude performance speedup over disk-based systems for iterative workloads. For example, Spark uses the Resilient Distributed Dataset (RDD) abstraction to cache data in memory and iteratively compute on it in a distributed cluster.

In this paper, we make the case that existing abtractions such as RDD are coarse-grained and only allow discrete cache levels to be used for caching data. This results in inefficient memory utilization and lower than optimal performance. In addition, relying on the programmer to enforce caching decisions for an RDD makes it infeasible for the system to adapt to runtime changes. To overcome these challenges, we propose Neutrino that employs fine-grained memory caching of RDD partitions and adapts to the use of different in-memory cache levels based on runtime characteristics of the cluster. First, it extracts a data flow graph to capture the data access dependencies between RDDs across different stages of a Spark application without relying on cache enforcement decisions from the programmer. Second, it uses a dynamic-programming based algorithm to guide caching decisions across the cluster and adaptively convert or discard the RDD partitions from the different cache levels.

We have implemented a prototype of Neutrino as an extension to Spark and use four different machine-learning workloads for performance evaluation. Neutrino improves the average job execution time by up to 70% over the use of Spark’s native memory cache levels.

Erci Xu, The Ohio State University

Mohit Saxena, IBM Almaden Research Center

Lawrence Chiu, IBM Almaden Research Center

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {196376,
author = {Erci Xu and Mohit Saxena and Lawrence Chiu},
title = {Neutrino: Revisiting Memory Caching for Iterative Data Analytics},
booktitle = {8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16)},
year = {2016},
address = {Denver, CO},
url = {https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/xu},
publisher = {USENIX Association},
month = jun,
}
Download
Xu PDF
View the slides
  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us