Finding the Needle in the Haystack: Predicting Storage Device Failures in Data Centers

Thursday, June 15, 2023 - 2:10 pm3:05 pm

Fanjing Meng and David Cesarano, IBM


Data is a valuable asset for organizations and its growth is exponential. However, storage device failures can result in data loss, service unavailability, and economic loss. Site Reliability Engineers face significant challenges managing and monitoring the millions or billions of storage devices deployed. Existing approaches to failure prediction have limitations in accuracy, performance, and cost-effectiveness. In this talk, we will present a practical, multi-phase proactive sampling-based approach and system that addresses these challenges. We will also provide a live demonstration of the system and practices in our data center, which has a multi-tiered cloud storage pool based on various storage devices. This talk aims to encourage practical storage failure prediction research to solve real-world challenges.

Fanjing Meng, IBM

Dr. Fanjing Meng is the CTO of IBM China System Development Lab, with over 20 years of experience in cutting-edge technology research, development and management. She specializes in sustainable computing, AIOps, ITOA, cloud computing, software and solution engineering. Her current focus is on developing a sustainable computing optimization and management platform to accelerate the digital transformation of enterprises. Dr. Meng has published over 30 academic papers and holds more than 40 international patents in innovative fields. She has received over 30 awards for her contributions to technological innovation from IBM and IEEE. Additionally, she actively participates in technical and academic communities, serving as a General Chair and committee member for international conferences, and as a project leader for IEEE WIE Beijing Affiliate and a speaker for IEEE Women in Services Computing (WISC).

David Cesarano, IBM

David Cesarano is a Solutions Architect at IBM and is located in Phoenix, Arizona, USA. He has over 20 years of experience with IT and a Bachelor of Science degree in Computer Information Systems from Northern Arizona University. He has several data and cloud certifications and a couple pending patents at IBM. His current area of focus is industry and data center management.

@conference {288293,
author = {Fanjing Meng and David Cesarano},
title = {Finding the Needle in the Haystack: Predicting Storage Device Failures in Data Centers},
year = {2023},
address = {Singapore},
publisher = {USENIX Association},
month = jun

Presentation Video