Thiara Ortiz, Netflix
In this talk, I'll discuss how we monitor service health as black boxes. SREs often face ambiguity, and I'll show how we use multiple measurement techniques to understand system behavior, aligning with the need for robust observability tools.. These strategies are crucial for system reliability and user experience. By proactively identifying and resolving issues, we ensure smoother playback experience and maintain user trust, even as the platform continues to evolve and gain maturity. The principles shared within this talk can be expanded to other applications such as AI reliability in data quality and model deployments.

Thiara is a Cloud Gaming SRE Manager at Netflix. Over the last five years, Thiara has been working on Open Connect, improving the resilience of the Netflix service for members around the world. Most recently, Thiara has been heavily involved with the introduction of Cloud Gaming on the Netflix platform.

author = {Thiara Ortiz},
title = {Gaining Insights from a Black Box System},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}