Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Overview
  • Workshop Organizers
  • Registration Information
  • Calendar
  • Technical Sessions
  • Hotel and Travel Information
  • Services
  • Sponsorship
  • Help Promote!
  • For Participants
  • Call for Contributions
  • Past Workshops

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » HadoopProv: Towards Provenance as a First Class Citizen in MapReduce
Tweet

connect with us

http://twitter.com/usenix
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

HadoopProv: Towards Provenance as a First Class Citizen in MapReduce

Authors: 

Sherif Akoush, Ripduman Sohan, and Andy Hopper, University of Cambridge

Abstract: 

We introduce HadoopProv, a modified version of Hadoopthat implements provenance capture and analysis in MapReduce jobs. It is designed to minimise provenancecapture overheads by (i) treating provenance tracking in Map and Reduce phases separately, and (ii) deferringconstruction of the provenance graph to the query stage. Provenance graphs are later joined on matching intermediate keys of the Map and Reduce provenance files. In our prototype implementation, HadoopProv has an overhead below 10% on typical job runtime (<7% and <30% average temporal increase on Map and Reduce tasks respectively). Additionally, we demonstrate that provenance queries are serviceable in O (k log n), where n is the number of records per Map task and k is the set of Map tasks in which the key appears.

Sherif Akoush, University of Cambridge

Ripduman Sohan, University of Cambridge

Andy Hopper, University of Cambridge

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {180758,
author = {Sherif Akoush and Ripduman Sohan and Andy Hopper},
title = {{HadoopProv}: Towards Provenance as a First Class Citizen in {MapReduce}},
booktitle = {5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13)},
year = {2013},
address = {Lombard, IL},
url = {https://www.usenix.org/conference/tapp13/technical-sessions/presentation/akoush},
publisher = {USENIX Association},
month = apr,
}
Download
Akoush PDF
View the slides

Presentation Video

Presentation Audio

MP3 Download OGG Download

Download Audio

  • Log in or    Register to post comments

© USENIX

  • Privacy Policy
  • Contact Us