Tracing Real-Time Distributed Systems

Thursday, 2019, October 3 - 14:4515:30

Evgeny Yakimov, Bloomberg LP


The concept of distributed tracing has often been explored in the context of web-based microservices in predominantly request/response style systems. But, what if you're dealing with a real-time data streaming system? How do you even start to model strongly asynchronous message flows, consisting of multi-service pipelines originating from many sources and distributed to even more consumers? These are the general characteristics of trading systems, which make tracing incredibly challenging.

This talk will explore our approach to applying these concepts to latency-sensitive real-time data streaming in large scale distributed systems. We will discuss the challenges of tracking long-running sessions, handling fan-in/fan-out data flows, and reducing storage costs while still capturing granular in-process tracing data. We will demonstrate how we utilise tracing to diagnose issues and measure service level indicators, as well as share our thoughts on how to further improve observability by applying these concepts on the client-side.

Evgeny is a software engineer turned SRE working at Bloomberg London with a focus on real-time distributed systems. He is a keen technology enthusiast, exploring how to apply SRE concepts such as tracing to the area of trading systems. He advocates for an SRE culture shift at Bloomberg throughout engineering and product management, utilising methods like SLIs and SLOs to put reliability at the heart of the organisation.

