Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Overview
  • Workshop Organizers
  • Registration Information
  • Calendar
  • Technical Sessions
  • Hotel and Travel Information
  • Services
  • Sponsorship
  • Help Promote!
  • For Participants
  • Call for Contributions
  • Past Workshops

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Technical Sessions
Tweet

connect with us

http://twitter.com/usenix
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

Technical Sessions

To access a presentation's content, please click on its title below.

All sessions will take place in Grand Ballroom J unless otherwise noted.

 

Tuesday, April 2, 2013

8:30 a.m.–9:00 a.m. Tuesday

Continental Breakfast

Grand Ballroom Foyers

9:00 a.m.–9:15 a.m. Tuesday

Opening Remarks

Program Co-Chairs: Alexandra Meliou, University of Massachusetts, Amherst, and Val Tannen, University of Pennsylvania

9:15 a.m.–10:30 a.m. Tuesday

Keynote Address

Managing the When-provenance of Data: Opportunities and Challenges

Wang-Chiew Tan, University of California, Santa Cruz

Available Media
  • Read more about Managing the When-provenance of Data: Opportunities and Challenges
10:30 a.m.–11:00 a.m. Tuesday

Break

Grand Ballroom Foyers

11:00 a.m.–12:30 p.m. Tuesday

Reproducibility and Audits

Session Chair: Paolo Missier, Newcastle University

ReproZip: Using Provenance to Support Computational Reproducibility

Fernando Chirigati, Polytechnic Institute of NYU; Dennis Shasha, New York University; Juliana Freire, Polytechnic Institute of NYU

We describe ReproZip, a tool that makes it easier for authors to publish reproducible results and for reviewers to validate these results. By tracking operating system calls, ReproZip systematically captures detailed provenance of existing experiments, including data dependencies, libraries used, and configuration parameters. This information is combined into a package that can be installed and run on a different environment. An important goal that we have for ReproZip is usability. Besides simplifying the creation of reproducible results, the system also helps reviewers. Because the package is self contained, reviewers need not install any additional software to run the experiments. In addition, ReproZip generates a workflow specification for the experiment. This not only enables reviewers to execute this specification within a workflow system to explore the experiment and try different configurations, but also the provenance kept by the workflow system can facilitate communication between reviewers and authors.

Available Media

Using Provenance for Repeatability

Quan Pham, University of Chicago; Tanu Malik and Ian Foster, University of Chicago and Argonne National Laboratory

We present Provenance-To-Use (PTU), a tool that minimizes computation time during repeatability testing. Authors can use PTU to build a package that includes their software program and a provenance trace of an initial reference execution. Testers can select a subset of the package’s processes for a partial deterministic replay—based, for example, on their compute, memory and I/O utilization as measured during the reference execution. Using the provenance trace, PTU guarantees that events are processed in the same order using the same data from one execution to the next. We show the efficiency of PTU for conducting repeatability testing of workflow-based scientific programs. 

Available Media

Supporting Undo and Redo in Scientific Data Analysis

Xiang Zhao, University of Massachusetts, Amherst; Emery R. Boose, Harvard University; Yuriy Brun, University of Massachusetts, Amherst; Barbara Staudt Lerner, Mount Holyoke College; Leon J. Osterweil, University of Massachusetts, Amherst

This paper presents a provenance-based technique to support undoing and redoing data analysis tasks. Our technique targets scientists who experiment with combinations of approaches to processing raw data into presentable datasets. Raw data may be noisy and in need of cleaning, it may suffer from sensor drift that requires retrospective calibration and data correction, or it mayneed gap-filling due to sensor malfunction or environmental conditions. Different raw datasets may have different issues requiring different kinds of adjustments, and each issue may potentially be handled by different approaches. Thus, scientists must often experiment with different sequences of approaches. In our work, we show how provenance information can be used to facilitate this kind of experimentation with scientific datasets. We describe an approach that supports the ability to (1) undo a set of tasks while setting aside the artifacts and consequences of performing those tasks, (2) replace, remove, or add a data-processing technique, and (3) redo automatically those set aside tasks that are consistent with changed technique. We have implemented our technique and demonstrate its utility with a case study of a common, sensor-network, data-processing scenario showing how our approach can reduce the cost of changing intermediate data-processing techniques in a complex, data-intensive process.

Available Media

Android Provenance: Diagnosing Device Disorders

Nathaniel Husted, Indiana University; Sharjeel Qureshi, Dawood Tariq, and Ashish Gehani, SRI International

Mobile devices are a ubiquitous part of our daily lives. Smartphones are being used in many areas where data privacy and integrity are a concern. One threat to integrity and privacy is the existence of bugs in operating system code. Little has been done to provide tools for system-wide runtime profiling and accountability. We propose operating system auditing and data provenance tracking as mechanisms for generating useful traces of system activity and information flow on mobile devices. The goal of these traces is to enable debugging and profiling of complicated system issues such as increased power drain. We contribute a prototype system for Android-based mobile devices and provide realistic examples of how our system can be used for debugging.

Available Media
12:30 p.m.–2:00 p.m. Tuesday

Workshop Luncheon

Grand Ballroom HI

2:00 p.m.–3:30 p.m. Tuesday

Provenance Capture and Analysis

Session Chair: Juliana Freire, New York University

Provenance for Data Mining

Boris Glavic, Illinois Institute of Technology; Javed Siddique, Periklis Andritsos, and Renee J. Miller, University of Toronto

Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates this reduction in size, the loss of information it entails can be problematic. Specifically, the results of data mining may be more confusing than insightful, if the user is not able to understand on which input data they are based and how they were created. In this paper, we argue that the user needs access to the provenance of mining results. Provenance, while extensively studied by the database, workflow, and distributed systems communities, has not yet been considered for data mining. We analyze the differences between database, workflow, and data mining provenance, suggest new types of provenance, and identify new use cases for provenance in data mining. To illustrate our ideas, we present a more detailed discussion of these concepts for two typical data mining algorithms: frequent item set mining and multi-dimensional scaling.

Available Media

Provenance Analyzer: Exploring Provenance Semantics with Logic Rules

Saumen Dey, Sean Riddle, and Bertram Ludäscher, University of California, Davis

Abstract not available.

Available Media

Declaratively Processing Provenance Metadata

Scott Moore, Harvard University; Ashish Gehani, SRI International

Systems that gather fine-grained provenance metadata must process and store large amounts of information. Filtering this metadata as it is collected has a number of benefits, including reducing the amount of persistent storage required and simplifying subsequent provenance queries. However, writing these filters in a procedural language is verbose and error prone. We propose a simple declarative language for processing provenance metadata and evaluate it by translating filters implemented in SPADE, an open-source provenance collection platform.

Available Media

OPUS: A Lightweight System for Observational Provenance in User Space

Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, and Andy Hopper, University of Cambridge

A variety of current provenance systems address the challenges of provenance capture, storage and query. However they require special setup and configuration, do not capture all I/O operations and limit themselves to specific specialised platforms. In this paper we propose the design of a data provenance capture and query tool called OPUS. OPUS works entirely in user space, is light-weight and requires minimum user intervention. OPUS is based on a formal model for versioning provenance objects that enables the succinct, complete representation of I/O operations in a manner that abstracts it from the details of the underlying operating system.

Available Media
3:30 p.m.–5:00 p.m. Tuesday

Poster Session with Refreshments

Grand Ballroom I

 

Wednesday, April 3, 2013

8:45 a.m.–9:15 a.m. Wednesday

Continental Breakfast

Grand Ballroom Foyers

9:15 a.m.–10:30 a.m. Wednesday

Keynote Address

World Domination Through Provenance

Margo Seltzer, Harvard School of Engineering and Applied Sciences and Oracle

Available Media
  • Read more about World Domination Through Provenance
10:30 a.m.–11:00 a.m. Wednesday

Break

Grand Ballroom Foyers

11:00 a.m.–12:30 p.m. Wednesday

Provenance Models and Applications

Session Chair: Boris Glavic, Illinois Institute of Technology

D-PROV: Extending the PROV Provenance Model with Workflow Structure

Paolo Missier, Newcastle University; Saumen Dey, University of California, Davis; Khalid Belhajjame, University of Manchester; Victor Cuevas-Vicenttín and Bertram Ludäscher, University of California, Davis

This paper presents an extension to the W3C PROV provenance model, aimed at representing process structure. Although the modelling of process structure is out of the scope of the PROV specification, it is beneficial when capturing and analyzing the provenance of data that is produced by programs or other formally encoded processes. In the paper, we motivate the need for such and extended model in the context of an ongoing large data federation and preservation project, DataONE, where provenance traces of scientific workflow runs are captured and stored alongside the data products. We introduce new provenance relations for modelling process structure along with their usage patterns, and present sample queries that demonstrate their benefit.

Available Media

IPAPI: Designing an Improved Provenance API

Lucian Carata, Ripduman Sohan, Andrew Rice, and Andy Hopper, University of Cambridge

We investigate the main limitations imposed by existing provenance systems in the development of provenance aware applications. In the case of disclosed provenance APIs, most of those limitations can be traced back to the inability to integrate provenance from different sources, layers and of different granularities into a coherent view of data production. We consider possible solutions in the design of an Improved Provenance API (IPAPI), based on a general model of how different system entities interact to generate, accumulate or propagate provenance. The resulting architecture enables a whole new range of provenance capture scenarios, for which available APIs do not provide adequate support.

Available Media

HadoopProv: Towards Provenance as a First Class Citizen in MapReduce

Sherif Akoush, Ripduman Sohan, and Andy Hopper, University of Cambridge

We introduce HadoopProv, a modified version of Hadoopthat implements provenance capture and analysis in MapReduce jobs. It is designed to minimise provenancecapture overheads by (i) treating provenance tracking in Map and Reduce phases separately, and (ii) deferringconstruction of the provenance graph to the query stage. Provenance graphs are later joined on matching intermediate keys of the Map and Reduce provenance files. In our prototype implementation, HadoopProv has an overhead below 10% on typical job runtime (<7% and <30% average temporal increase on Map and Reduce tasks respectively). Additionally, we demonstrate that provenance queries are serviceable in O (k log n), where n is the number of records per Map task and k is the set of Map tasks in which the key appears.

Available Media

A Provenance Model for Key-value Systems

Devdatta Kulkarni

In this paper we present key-value provenance model (KVPM). In KVPM, provenance information can be collected for both, the data and the data’s schema in a key-value system. Collection of the information isapplication-driven, and it can collected at different levels of the data model hierarchy. Here we present the capabilities of the KVPM system along with its design and implementation for Cassandra and initial evaluation.

Available Media

© USENIX

  • Privacy Policy
  • Conference Policies
  • Contact Us