Ruben Barroso, Google
Traditional Root Cause Analysis (RCA) is inadequate for learning from complex software incidents, often resulting in only proximal, sharp-end mitigations. For the past three years at Google, we have successfully applied Causal Analysis based on Systems Theory (CAST) to analyze major incidents. In this talk I will demonstrate the power of CAST using a real incident involving incorrect data being displayed to external users. I will contrast the findings of the original postmortem, which focused on the proximal chain of events, with the CAST analysis, which uncovered deeper, systemic environmental factors. I will show you how we use CAST to identify blunt-end systemic factors and translate those insights into recommendations that deliver durable safety improvements.

Ruben Barroso is a Staff Site Reliability Engineer (SRE) at Google. For five years, he has applied advanced systems safety engineering methods, including Systems-Theoretic Process Analysis (STPA) and Causal Analysis based on Systems Theory (CAST), to rigorously analyze and secure dozens of critical internal software systems at Google.

author = {Ruben Barroso},
title = {The Case of the Misnamed Cities: {CAST} Analysis of a Google Maps Incident},
year = {2026},
address = {Seattle, WA},
publisher = {USENIX Association},
month = mar
}
