Dayu: Fast and Low-interference Data Recovery in Very-large Storage Systems

Authors: 

Zhufan Wang and Guangyan Zhang, Tsinghua University; Yang Wang, The Ohio State University; Qinglin Yang, Tsinghua University; Jiaji Zhu, Alibaba Cloud

Abstract: 

This paper investigates I/O and failure traces from a realworld large-scale storage system: it finds that because of the scale of the system and because of the imbalanced and dynamic foreground traffic, no existing recovery protocol can compute a high-quality re-replicating strategy in a short time. To address this problem, this paper proposes Dayu, a timeslot based recovery architecture. For each timeslot, Dayu only schedules a subset of tasks which are expected to be finished in this timeslot: this approach reduces the computation overhead and naturally can cope with the dynamic foreground traffic. In each timeslot, Dayu incorporates a greedy algorithm with convex hull optimization to achieve both high speed and high quality. Our evaluation in a 1,000-node cluster and in a 3,500-node simulation both confirm that Dayu can outperform existing recovery protocols, achieving high speed and high quality.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {234988,
author = {Zhufan Wang and Guangyan Zhang and Yang Wang and Qinglin Yang and Jiaji Zhu},
title = {Dayu: Fast and Low-interference Data Recovery in Very-large Storage Systems},
booktitle = {2019 {USENIX} Annual Technical Conference ({USENIX} {ATC} 19)},
year = {2019},
isbn = {978-1-939133-03-8},
address = {Renton, WA},
pages = {993--1008},
url = {https://www.usenix.org/conference/atc19/presentation/wang-zhufan},
publisher = {{USENIX} Association},
}