Shuyang Wang, Yuqi Zhang, and Haonan Luo, Samsung R&D Institute China Xi'an, Samsung Electronics; Kangkang Liu, Tencent; Gil Kim, JongSung Na, Claude Kim, Geunrok Oh, and Kyle Choi, Samsung Electronics; Ni Xue and Xing He, Samsung R&D Institute China Xi'an, Samsung Electronics
As SSDs become increasingly popular in enterprise data centers, SSD failures have become a key concern for storage system reliability. In this paper, we propose FailureMiner, a joint key decision mining scheme based on SSD monitoring attributes to accurately and clearly identify SSD failure patterns in production environments. First, to address the imbalance between healthy and failed samples caused by the limited number of failed SSDs, FailureMiner introduces selective downsampling to carefully remove non-critical healthy samples, thereby focusing more on the subtle differences between easily confused failure patterns and health patterns. Second, FailureMiner streamlines the decision-making process of the machine learning model in failure prediction by capturing key decision steps based on their joint contribution. By filtering out redundant and noisy information, FailureMiner can capture joint key decisions, i.e., the simplified attribute combinations and value ranges relevant to failures, thus enabling accurate and interpretable identification of failure patterns.
FailureMiner is evaluated on real-world datasets, and the results show that our scheme improves precision and recall by an average of 38.6% and 80.5% respectively, compared with the existing failure prediction methods. The extracted joint key decisions have been deployed in Tencent's data centers to predict failures across more than 350,000 SSDs over a year, enhancing SSD reliability. The joint key decisions also reveal the failure patterns and factors affecting SSD health, which further helps operators handle failures and manufacturers improve product reliability.
FAST '26 Open Access Sponsored by
NetApp
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

author = {Shuyang Wang and Yuqi Zhang and Haonan Luo and Kangkang Liu and Gil Kim and Jongsung Na and Claude Kim and Geunrok Oh and Kyle Choi and Ni Xue and Xing He},
title = {{FailureMiner}: A Joint Key Decision Mining Scheme for Practical {SSD} Failure Prediction and Analysis},
booktitle = {24th USENIX Conference on File and Storage Technologies (FAST 26)},
year = {2026},
isbn = {978-1-939133-53-3},
address = {Santa Clara, CA},
pages = {597--611},
url = {https://www.usenix.org/conference/fast26/presentation/wang-shuyang},
publisher = {USENIX Association},
month = feb
}
