Boosting Full-Node Repair in Erasure-Coded Storage


Shiyao Lin, Guowen Gong, and Zhirong Shen, Xiamen University; Patrick P. C. Lee, The Chinese University of Hong Kong; Jiwu Shu, Xiamen University and Tsinghua University


As a common choice for fault tolerance in today's storage systems, erasure coding is still hampered by the induced substantial traffic in repair. A variety of erasure codes and repair algorithms are designed in recent years to relieve the repair traffic, yet we unveil via careful analysis that they are still plagued by several limitations, which restrict or even negate the performance gains. We present RepairBoost, a scheduling framework that can assist existing linear erasure codes and repair algorithms to boost the full-node repair performance. RepairBoost builds on three design primitives: (i) repair abstraction, which employs a directed acyclic graph to characterize a single-chunk repair process; (ii) repair traffic balancing, which balances the upload and download repair traffic simultaneously; and (iii) transmission scheduling, which carefully dispatches the requested chunks to saturate the most unoccupied bandwidth. Extensive experiments on Amazon EC2 show that RepairBoost can accelerate the repair by 35.0-97.1% for various erasure codes and repair algorithms.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {273767,
author = {Shiyao Lin and Guowen Gong and Zhirong Shen and Patrick P. C. Lee and Jiwu Shu},
title = {Boosting {Full-Node} Repair in {Erasure-Coded} Storage},
booktitle = {2021 USENIX Annual Technical Conference (USENIX ATC 21)},
year = {2021},
isbn = {978-1-939133-23-6},
pages = {641--655},
url = {},
publisher = {USENIX Association},
month = jul

Presentation Video