Suman Karumuri, Pinterest
Speed improves customer engagement. With the emergence of micro services, it is very common for a single customer interaction, such as loading the home page or querying a search end point, to invoke hundreds of calls to dozens of back-end services. In this multi-tenant environment, traditional monitoring and profiling tools can't tell us why a specific request was slow.
Distributed tracing is the only tool available today that lets us trace a request across several systems. Using the gathered traces, we can correctly debug how a specific request is processed across the service, understand where an application spent most of its time and gain insight into why a particular request was slow.
In this talk, I will present PinTrace, our zipkin based distributed tracing infrastructure. I will also talk about the challenges of instrumenting and deploying the tracing in a polyglot micro-services architecture at scale. I will also share a few examples of how we use traces from production to debug p99 latency issues, identify unnecessary network calls and performance bottlenecks in the system. I will conclude the talk with a few use cases of distributed tracing beyond performance optimization like architectural visualization.
Suman Karumuri is the lead for distributed tracing at Pinterest. Previously, he served as the lead for Zipkin project at Twitter. He is the author of an upcoming book Distributed Tracing from O’Reilly.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.