Multi-view Feature-based SSD Failure Prediction: What, When, and Why

Authors: 

Yuqi Zhang and Wenwen Hao, Samsung R&D Institute China Xi'an, Samsung Electronics; Ben Niu and Kangkang Liu, Tencent; Shuyang Wang, Na Liu, and Xing He, Samsung R&D Institute China Xi'an, Samsung Electronics; Yongwong Gwon and Chankyu Koh, Samsung Electronics

Abstract: 

Solid state drives (SSDs) play an important role in large-scale data centers. SSD failures affect the stability of storage systems and cause additional maintenance overhead. To predict and handle SSD failures in advance, this paper proposes a multi-view and multi-task random forest (MVTRF) scheme. MVTRF predicts SSD failures based on multi-view features extracted from both long-term and short-term monitoring data of SSDs. Particularly, multi-task learning is adopted to simultaneously predict what type of failure it is and when it will occur through the same model. We also extract the key decisions of MVTRF to analyze why the failure will occur. These details of failure would be useful for verifying and handling SSD failures. The proposed MVTRF is evaluated on the large-scale real data from data centers. The experimental results show that MVTRF has higher failure prediction accuracy and improves precision by 46.1% and recall by 57.4% on average compared with the existing schemes. The results also demonstrate the effectiveness of MVTRF on failure type and time prediction and failure cause identification, which helps to improve the efficiency of failure handling.

FAST '23 Open Access Sponsored by
NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {285760,
author = {Yuqi Zhang and Wenwen Hao and Ben Niu and Kangkang Liu and Shuyang Wang and Na Liu and Xing He and Yongwong Gwon and Chankyu Koh},
title = {Multi-view Feature-based {SSD} Failure Prediction: What, When, and Why},
booktitle = {21st USENIX Conference on File and Storage Technologies (FAST 23)},
year = {2023},
isbn = {978-1-939133-32-8},
address = {Santa Clara, CA},
pages = {409--424},
url = {https://www.usenix.org/conference/fast23/presentation/zhang},
publisher = {USENIX Association},
month = feb
}

Presentation Video