VectorCDC: Accelerating Data Deduplication with Vector Instructions

Authors: 

Sreeharsha Udayashankar, Abdelrahman Baba, and Samer Al-Kiswany, University of Waterloo

Abstract: 

Content-defined Chunking (CDC) algorithms dictate the overall space savings achieved by deduplication systems. However, due to their need to scan each file in its entirety, they are slow and often the main performance bottleneck within data deduplication. This paper presents VectorCDC, a method to accelerate hashless CDC using SSE/AVX CPU instructions. Our evaluation shows that VectorCDC achieves 21 − 46× higher throughput than existing vector acceleration techniques, without affecting the space savings achieved.

FAST '25 Open Access Sponsored by
NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {305256,
author = {Sreeharsha Udayashankar and Abdelrahman Baba and Samer Al-Kiswany},
title = {{VectorCDC}: Accelerating Data Deduplication with Vector Instructions},
booktitle = {23rd USENIX Conference on File and Storage Technologies (FAST 25)},
year = {2025},
isbn = {978-1-939133-45-8},
address = {Santa Clara, CA},
pages = {513--522},
url = {https://www.usenix.org/conference/fast25/presentation/udayashankar},
publisher = {USENIX Association},
month = feb
}

Presentation Video