Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Overview
  • Workshop Organizers
  • Registration Information
  • Calendar
  • Technical Sessions
  • Hotel and Travel Information
  • Services
  • Sponsorship
  • Help Promote!
  • For Participants
  • Call for Contributions
  • Past Workshops

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Provenance for Data Mining
Tweet

connect with us

http://twitter.com/usenix
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

Provenance for Data Mining

Authors: 

Boris Glavic, Illinois Institute of Technology; Javed Siddique, Periklis Andritsos, and Renee J. Miller, University of Toronto

Abstract: 

Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates this reduction in size, the loss of information it entails can be problematic. Specifically, the results of data mining may be more confusing than insightful, if the user is not able to understand on which input data they are based and how they were created. In this paper, we argue that the user needs access to the provenance of mining results. Provenance, while extensively studied by the database, workflow, and distributed systems communities, has not yet been considered for data mining. We analyze the differences between database, workflow, and data mining provenance, suggest new types of provenance, and identify new use cases for provenance in data mining. To illustrate our ideas, we present a more detailed discussion of these concepts for two typical data mining algorithms: frequent item set mining and multi-dimensional scaling.

Boris Glavic, Illinois Institute of Technology

Javed Siddique, University of Toronto

Periklis Andritsos, University of Toronto

Renee J. Miller, University of Toronto

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Glavic PDF
View the slides

Presentation Video

Presentation Audio

MP3 Download OGG Download

Download Audio

  • Log in or    Register to post comments

© USENIX

  • Privacy Policy
  • Contact Us