Agentix: An Efficient Serving Engine for LLM Agents as General Programs

Michael Luo, University of California, Berkeley, and Google DeepMind; Xiaoxiang Shi, Shanghai Jiao Tong University; Colin Cai, Tianjun Zhang, Justin Wong, and Yichuan Wang, University of California, Berkeley; Chi Wang, Yanping Huang, and Zhifeng Chen, Google DeepMind; Joseph E. Gonzalez and Ion Stoica, University of California, Berkeley

Large language model (LLM) applications are evolving beyond simple chatbots into dynamic, general-purpose agentic programs, which scale LLM calls and output tokens to help AI agents reason, explore, and solve complex tasks. However, existing LLM serving systems ignore dependencies between programs and calls, missing significant opportunities for optimization. Our analysis reveals that programs submitted to LLM serving engines experience long cumulative wait times, primarily due to head-of-line blocking at both the individual LLM request and the program.

To address this, we introduce Agentix, an LLM serving system that treats programs as first-class citizens to minimize their end-to-end latencies. Agentix intercepts LLM calls submitted by programs, enriching schedulers with program-level context. We propose two scheduling algorithms—for single-threaded and distributed programs—that preempt and prioritize LLM calls based on their programs' previously completed calls. Our evaluation demonstrates that across diverse LLMs and agentic workloads, Agentix improves throughput of programs by 4-15× at the same latency compared to state-of-the-art systems, such as vLLM.

NSDI '26 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {316108,
author = {Michael Luo and Xiaoxiang Shi and Colin Cai and Tianjun Zhang and Justin Wong and Yichuan Wang and Chi Wang and Yanping Huang and Zhifeng Chen and Joseph E. Gonzalez and Ion Stoica},
title = {Agentix: An Efficient Serving Engine for {LLM} Agents as General Programs},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {2443--2459},
url = {https://www.usenix.org/conference/nsdi26/presentation/luo},
publisher = {USENIX Association},
month = may
}

Presentation Video