{VectorCDC}: Accelerating Data Deduplication with Vector Instructions

Sreeharsha Udayashankar; Abdelrahman Baba; Samer Al-Kiswany

Sreeharsha Udayashankar, Abdelrahman Baba, and Samer Al-Kiswany, University of Waterloo

Content-defined Chunking (CDC) algorithms dictate the overall space savings achieved by deduplication systems. However, due to their need to scan each file in its entirety, they are slow and often the main performance bottleneck within data deduplication. This paper presents VectorCDC, a method to accelerate hashless CDC using SSE/AVX CPU instructions. Our evaluation shows that VectorCDC achieves 21 − 46× higher throughput than existing vector acceleration techniques, without affecting the space savings achieved.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

Conference attendees

BibTeX

@inproceedings {305256,
author = {Sreeharsha Udayashankar and Abdelrahman Baba and Samer Al-Kiswany},
title = {{VectorCDC}: Accelerating Data Deduplication with Vector Instructions},
booktitle = {23rd USENIX Conference on File and Storage Technologies (FAST 25)},
year = {2025},
isbn = {978-1-939133-45-8},
address = {Santa Clara, CA},
pages = {513--522},
url = {https://www.usenix.org/conference/fast25/presentation/udayashankar},
publisher = {USENIX Association},
month = feb
}

Download

Udayashankar PDF

View the slides

VectorCDC: Accelerating Data Deduplication with Vector Instructions

Open Access Media

This content is available to:

Presentation Video