Beyond Distributed Tracing

Tuesday, March 15, 2022 - 4:45 pm5:05 pm

Kyusoon Lee, Google


In the era of microservice architecture, distributed tracing solutions offer visibility across services. However, this visibility is at the level of individual requests, failing to deliver any sort of aggregated observability to average users.

In this talk, we introduce a novel yet simple method (“CUI attribution”) of creating an aggregated end-to-end view at the level of what we call “Critical User Interaction (CUI)” (e.g. “play a video”, or “purchase an item”) using the baggage mechanism from Google’s Census.

The aggregated end-to-end view is intuitive for average users to grok, reducing time to root-cause failures and outages. The method is applicable to many other areas, such as dependency analysis and fault-tolerance testing in production. Any open source projects or enterprise distributed tracing solutions that support a similar baggage mechanism can easily adopt our method with little effort to offer richer insights to their users.

Kyusoon Lee is a Site Reliability Engineer at Google, whose passion lies in acquiring visibility from internal systems and applying it to improve reliability for the external users.

Since 2019, he has been leading CUI attribution efforts with primary focus on the application in automated root-causing, impact assessment, and outage prevention. He currently drives a few long-term cross-org technical roadmaps at Google based on CUI attribution, while continuing to further explore the value of CUI attribution via experimentation.

He would love to exchange experiences and insights with anyone.

