Observability for {LLMs}: Understanding {What’s} Happening Under the Hood

Salman Munaf

Wednesday, March 26, 2025 - 4:50 pm–5:35 pm

Salman Munaf, TikTok

As LLMs and AI systems move into the core of modern products, keeping them reliable requires a new way of thinking. Monitoring large language models is fundamentally different from monitoring traditional web services. Latency and error rates alone no longer tell the full story.

This talk explores how observability changes when systems are driven by LLMs and GPU inference rather than REST APIs and CPU workloads. It breaks down the unique behaviors of these systems, including unpredictable model outputs, long context chains, token drift, embedding stores, and GPU bound execution.

Using real world examples, the session shows which signals actually matter, from token throughput and model latency to GPU utilization, memory pressure, and energy efficiency. Attendees will leave with a clear mental model for understanding LLM system health and a new perspective on reliability when your most critical component is a model that learns, drifts, and scales very differently from code.

Salman Munaf is a Lead Site Reliability Engineer at TikTok, where he builds and operates large-scale video infrastructure serving millions of users. He specializes in distributed systems, observability, and reliability at scale, with prior experience as a Software Engineer at Meta. Salman is passionate about helping developers embed reliability into their workflows from day one, making complex systems more resilient and easier to operate.

Connect:

View the slides

Observability for LLMs: Understanding What’s Happening Under the Hood

Presentation Video