Seer: Online Context Learning for Fast Synchronous {LLM} Reinforcement Learning

Ruoyu Qin; Weiran He; Weixiao Huang; Yangkun Zhang; Yikai Zhao; Bo Pang; Xinran Xu; Yingdi Shan; Yongwei Wu; Mingxing Zhang

Ruoyu Qin, Moonshot AI and Tsinghua University; Weiran He, Weixiao Huang, Yangkun Zhang, Yikai Zhao, Bo Pang, and Xinran Xu, Moonshot AI; Yingdi Shan, Yongwei Wu, and Mingxing Zhang, Tsinghua University

Reinforcement Learning (RL) has emerged as a critical technique for advancing modern Large Language Models (LLMs), yet existing synchronous RL systems face severe performance bottlenecks. The rollout phase, which dominates end-to-end iteration time, suffers from substantial long-tail latency and poor resource utilization due to inherent workload imbalance. We present Seer, a novel context learning RL system that addresses these challenges through a key observation: requests sharing the same prompt exhibit strong similarities in output lengths and response patterns. Leveraging this insight, Seer introduces three coordinated techniques: (1) divided rollout for dynamic load balancing, (2) context-aware scheduling to mitigate long-tail request delays, and (3) adaptive grouped speculative decoding to accelerate generation. These mechanisms work in concert to markedly reduce long-tail latency and improve resource efficiency during rollout. Evaluations on production-grade RL workloads demonstrate that Seer achieves up to 2.04× end-to-end rollout throughput improvement compared to the state-of-the-art synchronous RL systems, while notably reducing long-tail latency by 72–94%.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {318477,
author = {Ruoyu Qin and Weiran He and Weixiao Huang and Yangkun Zhang and Yikai Zhao and Bo Pang and Xinran Xu and Yingdi Shan and Yongwei Wu and Mingxing Zhang},
title = {Seer: Online Context Learning for Fast Synchronous {LLM} Reinforcement Learning},
booktitle = {20th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26)},
year = {2026},
isbn = {978-1-939133-55-7},
address = {Seattle, WA},
pages = {883--901},
url = {https://www.usenix.org/conference/osdi26/presentation/qin},
publisher = {USENIX Association},
month = jul
}

Download

Qin PDF

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

Open Access Media