Proactive error prediction to improve storage system reliability

Authors: 

Farzaneh Mahdisoltani, University of Toronto; Ioan Stefanovici, Microsoft Research; Bianca Schroeder, University of Toronto

Abstract: 

This paper proposes the use of machine learning techniques to make storage systems more reliable in the face of sector errors. Sector errors are partial drive failures, where individual sectors on a drive become unavailable, and occur at a high rate in both hard disk drives and solid state drives. The data in the affected sectors can only be recovered through redundancy in the system (e.g. another drive in the same RAID) and is lost if the error is encountered while the system operates in degraded mode, e.g. during RAID reconstruction.

In this paper, we explore a range of different machine learning techniques and show that sector errors can be predicted ahead of time with high accuracy. Prediction is robust, even when only little training data or only training data for a different drive model is available. We also discuss a number of possible use cases for improving storage system reliability through the use of sector error predictors. We evaluate one such use case in detail: We show that the mean time to detecting errors (and hence the window of vulnerability to data loss) can be greatly reduced by adapting the speed of a scrubber based on error predictions.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {203217,
author = {Farzaneh Mahdisoltani and Ioan Stefanovici and Bianca Schroeder},
title = {Proactive error prediction to improve storage system reliability},
booktitle = {2017 USENIX Annual Technical Conference (USENIX ATC 17)},
year = {2017},
isbn = {978-1-931971-38-6},
address = {Santa Clara, CA},
pages = {391--402},
url = {https://www.usenix.org/conference/atc17/technical-sessions/presentation/mahdisoltani},
publisher = {USENIX Association},
month = jul
}

Presentation Audio