Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • NSDI '12 Home
  • Registration Information
  • Discounts
  • Organizers
  • At a Glance
  • Technical Sessions
  • Poster and Demo Session
  • Birds-of-a-Feather Sessions
  • Workshops
  • Sponsors
  • Activities
  • Calendar
  • Hotel and Travel Information
  • Students
  • Questions?
  • Help Promote
  • For Participants
  • Call for Papers
  • Past Proceedings

sponsors

Gold Sponsor
Silver Sponsor
Silver Sponsor
Microsoft Research
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
LXer

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions
Tweet

connect with us

http://twitter.com/usenix
http://www.facebook.com/events/307418625975555/

Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions

Authors: 

Jiaxing Zhang and Hucheng Zhou, Microsoft Research Asia; Rishan Chen, Microsoft Research Asia and Peking University; Xuepeng Fan, Microsoft Research Asia and Huazhong University of Science and Technology; Zhenyu Guo and Haoxiang Lin, Microsoft Research Asia; Jack Y. Li, Microsoft Research Asia and Georgia Institute of Technology; Wei Lin and Jingren Zhou, Microsoft Bing; Lidong Zhou, Microsoft Research Asia

Abstract: 

Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as “black boxes”, we propose to analyze those functions to turn them into “gray boxes” that expose opportunities to optimize data shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47% in terms of disk and network I/O for shuffling, and up to 48% in terms of cross-pod network traffic.

 

Jiaxing Zhang, Microsoft Research Asia

Hucheng Zhou, Microsoft Research Asia

Rishan Chen, Microsoft Research Asia and Peking University

Xuepeng Fan, Microsoft Research Asia and Huazhong University of Science and Technology

Zhenyu Guo, Microsoft Research Asia

Haoxiang Lin, Microsoft Research Asia

Jack Y. Li, Microsoft Research Asia and Georgia Institute of Technology

Wei Lin, Microsoft Bing

Jingren Zhou, Microsoft Bing

Lidong Zhou, Microsoft Research Asia

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {180657,
author = {Jiaxing Zhang and Hucheng Zhou and Rishan Chen and Xuepeng Fan and Zhenyu Guo and Haoxiang Lin and Jack Y. Li and Wei Lin and Jingren Zhou and Lidong Zhou},
title = {Optimizing Data Shuffling in {Data-Parallel} Computation by Understanding {User-Defined} Functions},
booktitle = {9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12)},
year = {2012},
isbn = {978-931971-92-8},
address = {San Jose, CA},
pages = {295--308},
url = {https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zhang},
publisher = {USENIX Association},
month = apr,
}
Download
Zhang PDF
View the slides

Presentation Video

Presentation Audio

MP3 Download OGG Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Microsoft Research

Bronze Sponsors

Media Sponsors & Industry Partners

LXer

© USENIX

  • Privacy Policy
  • Contact Us