Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • OSDI '14 Home
  • Symposium Organizers
  • At a Glance
  • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
  • Technical Sessions
  • Co-Located Workshops
  • Activities
    • Birds-of-a-Feather Sessions
    • Poster Sessions
  • Sponsorship
  • Students and Grants
  • Co-located Workshops
  • Questions?
  • Help Promote!
  • For Participants
  • Call for Papers
  • Past Symposia

sponsors

Diamond Sponsor
Diamond Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
General Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home ยป GraphX: Graph Processing in a Distributed Dataflow Framework
Tweet

connect with us

http://twitter.com/usenix
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

GraphX: Graph Processing in a Distributed Dataflow Framework

Thursday, August 7, 2014 - 3:30pm
Authors: 

Joseph E. Gonzalez, University of California, Berkeley; Reynold S. Xin, University of California, Berkeley, and Databricks; Ankur Dave, Daniel Crankshaw, and Michael J. Franklin, University of California, Berkeley; Ion Stoica, University of California, Berkeley, and Databricks

Abstract: 

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with specialized graph systems, GraphX recasts graph-specific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

Joseph E. Gonzalez, University of California, Berkeley

Reynold S. Xin, University of California, Berkeley, and Databricks

Ankur Dave, University of California, Berkeley

Daniel Crankshaw, University of California, Berkeley

Michael J. Franklin, University of California, Berkeley

Ion Stoica, University of California, Berkeley

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Gonzalez PDF
View the slides

Presentation Video 

Presentation Audio

MP3 Download

Download Audio

  • Log in or    Register to post comments

Diamond Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us