InftyDedup: Scalable and Cost-Effective Cloud Tiering with Deduplication

Authors: 

Iwona Kotlarska, Andrzej Jackowski, Krzysztof Lichota, Michal Welnicki, and Cezary Dubnicki, 9LivesData, LLC; Konrad Iwanicki, University of Warsaw

Abstract: 

Cloud tiering is the process of moving selected data from on-premise storage to the cloud, which has recently become important for backup solutions. As subsequent backups usually contain repeating data, deduplication in cloud tiering can significantly reduce cloud storage utilization, and hence costs.

In this paper, we introduce InftyDedup, a novel system for cloud tiering with deduplication. Unlike existing solutions, it maximizes scalability by utilizing cloud services not only for storage but also for computation. Following a distributed batch approach with dynamically assigned cloud computation resources, InftyDedup can deduplicate multi-petabyte backups from multiple sources at costs on the order of a couple of dollars. Moreover, by selecting between hot and cold cloud storage based on the characteristics of each data chunk, our solution further reduces the overall costs by up to 26%–44%. InftyDedup is implemented in a state-of-the-art commercial backup system and evaluated in the cloud of a hyperscaler.

FAST '23 Open Access Sponsored by
NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {285780,
author = {Iwona Kotlarska and Andrzej Jackowski and Krzysztof Lichota and Michal Welnicki and Cezary Dubnicki and Konrad Iwanicki},
title = {{InftyDedup}: Scalable and {Cost-Effective} Cloud Tiering with Deduplication},
booktitle = {21st USENIX Conference on File and Storage Technologies (FAST 23)},
year = {2023},
isbn = {978-1-939133-32-8},
address = {Santa Clara, CA},
pages = {33--48},
url = {https://www.usenix.org/conference/fast23/presentation/kotlarska},
publisher = {USENIX Association},
month = feb
}

Presentation Video