{StriaTrace}: Efficient Tracing and Diagnosis for Online {LLM} Inference (Operational Systems)

Haonan Wu; Yanqing Chen; Kun Qian; Xue Li; Jingbo Xu; Erci Xu; Ennan Zhai; Wenyuan Yu; Guangtao Xue; Jingren Zhou

Haonan Wu, Shanghai Jiao Tong University and Alibaba Group; Yanqing Chen, Kun Qian, Xue Li, and Jingbo Xu, Alibaba Group; Erci Xu, Shanghai Jiao Tong University; Ennan Zhai and Wenyuan Yu, Alibaba Group; Guangtao Xue, Shanghai Jiao Tong University and Shanghai Key Laboratory of Trusted Data Circulation and Governance and Web3; Jingren Zhou, Alibaba Group

Large Language Model (LLM) inference services in production operate under stringent, fine-grained Service Level Objectives (SLOs). Unlike throughput-oriented LLM training, even sporadic performance anomalies during inference can violate SLOs, underscoring the need for improved tracing and diagnosis solutions. However, existing solutions face two primary limitations: (1) existing tracing tools incur prohibitive overhead; (2) training-centric diagnosis tools are ill-suited for capturing sporadic inference anomalies. To bridge these gaps, we propose StriaTrace, a novel tracing and diagnosis system tailored for online LLM inference. StriaTrace is built upon three principles distilled from production experience: (1) tracing key synchronization points, (2) tracing critical paths, and (3) detailed tracing only during abnormalities. StriaTrace further constructs a dynamic regression-based roofline model and correlation-based diagnosis to identify why each LLM inference abnormality happens. Evaluations show that StriaTrace reduces tracing overhead by 97.8% relative to alternatives. StriaTrace has been widely used in our development, testing, and production release cycles, and has successfully diagnosed hundreds of abnormalities spanning 19 distinct root causes.

Category:

Operational Systems Paper

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {318447,
author = {Haonan Wu and Yanqing Chen and Kun Qian and Xue Li and Jingbo Xu and Erci Xu and Ennan Zhai and Wenyuan Yu and Guangtao Xue and Jingren Zhou},
title = {{StriaTrace}: Efficient Tracing and Diagnosis for Online {LLM} Inference (Operational Systems)},
booktitle = {20th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26)},
year = {2026},
isbn = {978-1-939133-55-7},
address = {Seattle, WA},
pages = {627--645},
url = {https://www.usenix.org/conference/osdi26/presentation/wu-haonan},
publisher = {USENIX Association},
month = jul
}

Download

Wu PDF

StriaTrace: Efficient Tracing and Diagnosis for Online LLM Inference (Operational Systems)

Open Access Media