Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity


Saurabh Kadekodi, K. V. Rashmi, and Gregory R. Ganger, Carnegie Mellon University


Large-scale cluster storage systems typically consist of a heterogeneous mix of storage devices with significantly varying failure rates. Despite such differences among devices, redundancy settings are generally configured in a one-scheme-for-all fashion. In this paper, we make a case for exploiting reliability heterogeneity to tailor redundancy settings to different device groups. We present HeART, an online tuning tool that guides selection of, and transitions between redundancy settings for long-term data reliability, based on observed reliability properties of each disk group. By processing disk failure data over time, HeART identifies the boundaries and steady-state failure rate for each deployed disk group (e.g., by make/model). Using this information, HeART suggests the most space-efficient redundancy option allowed that will achieve the specified target data reliability. Analysis of longitudinal failure data for a large production storage cluster shows the robustness of HeART's failure-rate determination algorithms. The same analysis shows that a storage system guided by HeART could provide target data reliability levels with fewer disks than one-scheme-for-all approaches: 11–16% fewer compared to erasure codes like 10-of-14 or 6-of-9 and 33% fewer compared to 3-way replication.

FAST '19 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {227828,
author = {Saurabh Kadekodi and K. V. Rashmi and Gregory R. Ganger},
title = {Cluster storage systems gotta have {HeART}: improving storage efficiency by exploiting disk-reliability heterogeneity},
booktitle = {17th USENIX Conference on File and Storage Technologies (FAST 19)},
year = {2019},
isbn = {978-1-939133-09-0},
address = {Boston, MA},
pages = {345--358},
url = {},
publisher = {USENIX Association},
month = feb

Presentation Video