Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • NSDI '12 Home
  • Registration Information
  • Discounts
  • Organizers
  • At a Glance
  • Technical Sessions
  • Poster and Demo Session
  • Birds-of-a-Feather Sessions
  • Workshops
  • Sponsors
  • Activities
  • Calendar
  • Hotel and Travel Information
  • Students
  • Questions?
  • Help Promote
  • For Participants
  • Call for Papers
  • Past Proceedings

sponsors

Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Microsoft Research
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
LXer

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Camdoop: Exploiting In-network Aggregation for Big Data Applications
Tweet

connect with us

http://twitter.com/usenix
http://www.facebook.com/events/307418625975555/

Camdoop: Exploiting In-network Aggregation for Big Data Applications

Authors: 

Paolo Costa, Microsoft Research Cambridge and Imperial College London; Austin Donnelly, Antony Rowstron, and Greg O'Shea, Microsoft Research Cambridge

Abstract: 

Large companies like Facebook, Google, and Microsoft as well as a number of small and medium enterprises daily process massive amounts of data in batch jobs and in real time applications. This generates high network traffic, which is hard to support using traditional, oversubscribed, network infrastructures. To address this issue, several alternative network topologies have been proposed, aiming to increase the bandwidth available in enterprise clusters.

We observe that in many of the commonly used workloads, data is aggregated during the process and the output size is a fraction of the input size. This motivated us to explore a different point in the design space. Instead of increasing the bandwidth, we focus on decreasing the traffic by pushing aggregation from the edge into the network.

We built Camdoop, a MapReduce-like system running on CamCube, a cluster design that uses a direct-connect network topology with servers directly linked to other servers. Camdoop exploits the property that CamCube servers forward traffic, to perform in-network aggregation of data during the shuffle phase. Camdoop supports the same functions used in MapReduce and is compatible with existing MapReduce applications. We demonstrate that, in common cases, Camdoop significantly reduces the network traffic and provides high performance increase over a version of Camdoop running over a switch and against two production systems, Hadoop and Dryad/DryadLINQ.

 

Paolo Costa, Microsoft Research Cambridge and Imperial College London

Austin Donnelly, Microsoft Research Cambridge

Antony Rowstron, Microsoft Research Cambridge

Greg O'Shea, Microsoft Research Cambridge

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {180565,
author = {Paolo Costa and Austin Donnelly and Antony Rowstron and Greg O{\textquoteright}Shea},
title = {Camdoop: Exploiting In-network Aggregation for Big Data Applications},
booktitle = {9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12)},
year = {2012},
isbn = {978-931971-92-8},
address = {San Jose, CA},
pages = {29--42},
url = {https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/costa},
publisher = {USENIX Association},
month = apr,
}
Download
Costa PDF
View the slides

Presentation Video

Presentation Audio

MP3 Download OGG Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Microsoft Research

Bronze Sponsors

Media Sponsors & Industry Partners

LXer

© USENIX

  • Privacy Policy
  • Contact Us