Skip to main content
Back to USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Co-located Workshops
  • Program
    • Summit Program
    • Poster Session
  • Participate
    • Call for Posters
  • Sponsorship
  • About
    • Organizers
    • Services
    • Questions
    • Help Promote!
    • Past Summits
  • Home
  • Attend
  • Program
  • Activities
  • Sponsorship
  • Participate
  • About

sponsors

Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

help promote

HotStorage '16 button

USENIX Conference Policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

Metadata Considered Harmful…to Deduplication

Xing Lin, University of Utah;  Fred Douglis and Jim Li, EMC Corporation; Xudong Li, Nankai University; Robert Ricci, University of Utah; Stephen Smaldone and Grant Wallace, EMC Corporation

Deduplication is widely used to improve space efficiency in storage systems. While much attention has been paid to making the process of deduplication fast and scalable, the effectiveness of deduplication can vary dramatically depending on the data stored. We show that many file formats suffer from a fundamental design property that is incompatible with deduplication: they intersperse metadata with data in ways that result in otherwise identical data being different. We examine three models for improving deduplication in the presence of embedded metadata: deduplicationfriendly data formats, application-level post-processing, and format-aware deduplication. Working with realworld file formats and datasets, we find that by separating metadata from data, deduplication ratios are improved significantly—in some cases as dramatically as 5.6.

Xing Lin, University of Utah

Fred Douglis, EMC Corporation

Jim Li, EMC Corporation

Xudong Li, Nankai University

Robert Ricci, University of Utah

Stephen Smaldone, EMC Corporation

Grant Wallace, EMC Corporation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {190565,
author = {Xing Lin and Fred Douglis and Jim Li and Xudong Li and Robert Ricci and Stephen Smaldone and Grant Wallace},
title = {Metadata Considered {Harmful{\textellipsis}to} Deduplication},
booktitle = {7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 15)},
year = {2015},
address = {Santa Clara, CA},
url = {https://www.usenix.org/conference/hotstorage15/workshop-program/presentation/lin},
publisher = {USENIX Association},
month = jul
}
Download
Lin PDF
View the slides
  • Log in or register to post comments

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX
EIN 13-3055038

  • Privacy Policy
  • Contact Us