Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Informaton
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-located Events
      • USENIX ATC '15
      • HotCloud '15
  • Program
    • Workshop Program
  • Activities
    • Birds-of-a-Feather Sessions
  • Sponsorship
  • Participate
    • Call for Papers
    • Instructions for Participants
  • About
    • Workshop Organizers
    • Questions
    • Help Promote
    • Past Workshops
  • Home
  • Attend
  • Program
  • Activities
  • Sponsorship
  • Participate
  • About

sponsors

Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

help promote

HotStorage '16 button

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Metadata Considered Harmful…to Deduplication
Tweet

connect with us

Metadata Considered Harmful…to Deduplication

Authors: 

Xing Lin, University of Utah;  Fred Douglis and Jim Li, EMC Corporation; Xudong Li, Nankai University; Robert Ricci, University of Utah; Stephen Smaldone and Grant Wallace, EMC Corporation

Abstract: 

Deduplication is widely used to improve space efficiency in storage systems. While much attention has been paid to making the process of deduplication fast and scalable, the effectiveness of deduplication can vary dramatically depending on the data stored. We show that many file formats suffer from a fundamental design property that is incompatible with deduplication: they intersperse metadata with data in ways that result in otherwise identical data being different. We examine three models for improving deduplication in the presence of embedded metadata: deduplicationfriendly data formats, application-level post-processing, and format-aware deduplication. Working with realworld file formats and datasets, we find that by separating metadata from data, deduplication ratios are improved significantly—in some cases as dramatically as 5.6.

Xing Lin, University of Utah

Fred Douglis, EMC Corporation

Jim Li, EMC Corporation

Xudong Li, Nankai University

Robert Ricci, University of Utah

Stephen Smaldone, EMC Corporation

Grant Wallace, EMC Corporation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {190565,
author = {Xing Lin and Fred Douglis and Jim Li and Xudong Li and Robert Ricci and Stephen Smaldone and Grant Wallace},
title = {Metadata Considered {Harmful{\textellipsis}to} Deduplication},
booktitle = {7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 15)},
year = {2015},
address = {Santa Clara, CA},
url = {https://www.usenix.org/conference/hotstorage15/workshop-program/presentation/lin},
publisher = {USENIX Association},
month = jul,
}
Download
Lin PDF
View the slides
  • Log in or    Register to post comments

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us