On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes

Authors: 

Oleg Kolosov, School of Electrical Engineering, Tel Aviv University; Gala Yadgar, Computer Science Department, Technion and School of Electrical Engineering, Tel Aviv University; Matan Liram, Computer Science Department, Technion; Itzhak Tamo, School of Electrical Engineering, Tel Aviv University; Alexander Barg, Department of ECE/ISR, University of Maryland

Abstract: 

Erasure codes are used in large-scale storage systems to allow recovery of data from a failed node. A recently developed class of erasure codes, termed locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate more efficient recovery scenarios by storing additional parity blocks in the system, but these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing codes differ in their use of the additional parity blocks, but also in their locality semantics and in the parameters for which they are defined. As a result, existing theoretical models cannot be used to directly compare different LRCs to determine which code will offer the best recovery performance, and at what cost.

In this study, we perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure’s LRCs, and the recently proposed Optimal-LRCs in light of two new metrics: the average degraded read cost, and the normalized repair cost. We show the tradeoff between these costs and the code’s fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster deployed on Amazon EC2 further demonstrates the different effects of realistic network and storage bottlenecks on the benefit from each examined LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {216009,
author = {Oleg Kolosov and Gala Yadgar and Matan Liram and Itzhak Tamo and Alexander Barg},
title = {On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes},
booktitle = {2018 USENIX Annual Technical Conference (USENIX ATC 18)},
year = {2018},
isbn = {978-1-939133-01-4},
address = {Boston, MA},
pages = {865--877},
url = {https://www.usenix.org/conference/atc18/presentation/kolosov},
publisher = {USENIX Association},
month = jul
}

Presentation Audio