Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-located Events
      • SOUPS 2016
      • HotCloud '16
      • HotStorage '16
  • Program
    • At a Glance
    • Technical Sessions
  • Activities
    • Birds-of-a-Feather Sessions
    • Poster Session
  • Participate
    • Instructions for Authors and Speakers
    • Call for Papers
    • Call for Practitioner Talks
  • Sponsorship
  • About
    • Organizers
    • Help Promote!
    • Questions
    • Past Conferences
  • Home
  • Attend
  • Program
  • Activities
  • Participate
  • Sponsorship
  • About

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner
Industry Partner

help promote

USENIX ATC '16

Get
Help Promote graphics!

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home ยป FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication
Tweet

connect with us

FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication

Authors: 

Wen Xia, Huazhong University of Science and Technology and Sangfor Technologies Co., Ltd.; Yukun Zhou, Huazhong University of Science and Technology; Hong Jiang, University of Texas at Arlington; Dan Feng, Yu Hua, Yuchong Hu, Yucheng Zhang, and Qing Liu, Huazhong University of Science and Technology

Abstract: 

Content-Defined Chunking (CDC) has been playing a key role in data deduplication systems in the past 15 years or so due to its high redundancy detection abil- ity. However, existing CDC-based approaches introduce heavy CPU overhead because they declare the chunk cut- points by computing and judging the rolling hashes of the data stream byte by byte. In this paper, we pro- pose FastCDC, a Fast and efficient CDC approach, that builds and improves on the latest Gear-based CDC ap- proach, one of the fastest CDC methods to our knowl- edge. The key idea behind FastCDC is the combined use of three key techniques, namely, simplifying and enhanc- ing the hash judgment to address our observed challenges facing Gear-based CDC, skipping sub-minimum chunk cut-point to further speed up CDC, and normalizing the chunk-size distribution in a small specified region to ad- dress the problem of the decreased deduplication ratio stemming from the cut-point skipping. Our evaluation results show that, by using a combination of the three techniques, FastCDC is about 10x faster than the best of open-source Rabin-based CDC, and about 3x faster than the state-of-the-art Gear- and AE-based CDC, while achieving nearly the same deduplication ratio as the clas- sic Rabin-based approach.

Wen Xia, Huazhong University of Science and Technology and Sangfor Technologies Co., Ltd.

Yukun Zhou, Huazhong University of Science and Technology

Hong Jiang, University of Texas at Arlington

Dan Feng, Huazhong University of Science and Technology

Yu Hua, Huazhong University of Science and Technology

Yuchong Hu, Huazhong University of Science and Technology

Qing Liu, Huazhong University of Science and Technology

Yucheng Zhang, Huazhong University of Science and Technology

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {196196,
author = {Wen Xia and Yukun Zhou and Hong Jiang and Dan Feng and Yu Hua and Yuchong Hu and Qing Liu and Yucheng Zhang},
title = {{FastCDC}: A Fast and Efficient {Content-Defined} Chunking Approach for Data Deduplication},
booktitle = {2016 USENIX Annual Technical Conference (USENIX ATC 16)},
year = {2016},
isbn = {978-1-931971-30-0},
address = {Denver, CO},
pages = {101--114},
url = {https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia},
publisher = {USENIX Association},
month = jun,
}
Download
Xia PDF
View the slides

Presentation Audio

MP3 Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us