Sketching Volume Capacities in Deduplicated Storage


Danny Harnik and Moshik Hershcovitch, IBM Research; Yosef Shatsky, IBM Systems; Amir Epstein, Citi Innovation Lab TLV; Ronen Kat, IBM Research


The adoption of deduplication in storage systems has introduced significant new challenges for storage management. Specifically, the physical capacities associated with volumes are no longer readily available. In this work we introduce a new approach to analyzing capacities in deduplicated storage environments. We provide sketch-based estimations of fundamental capacity measures required for managing a storage system: How much physical space would be reclaimed if a volume or group of volumes were to be removed from a system (the {\em reclaimable} capacity) and how much of the physical space should be attributed to each of the volumes in the system (the {\em attributed} capacity). Our methods also support capacity queries for volume groups across multiple storage systems, e.g., how much capacity would a volume group consume after being migrated to another storage system? We provide analytical accuracy guarantees for our estimations as well as empirical evaluations. Our technology is integrated into a prominent all-flash storage array and exhibits high performance even for very large systems. We also demonstrate how this method opens the door for performing placement decisions at the data center level and obtaining insights on deduplication in the field.

FAST '19 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {227818,
author = {Danny Harnik and Moshik Hershcovitch and Yosef Shatsky and Amir Epstein and Ronen Kat},
title = {Sketching Volume Capacities in Deduplicated Storage},
booktitle = {17th USENIX Conference on File and Storage Technologies (FAST 19)},
year = {2019},
isbn = {978-1-939133-09-0},
address = {Boston, MA},
pages = {107--119},
url = {},
publisher = {USENIX Association},
month = feb

Presentation Video