{DynaRL}: Flexible and Dynamic Scheduling of {Large-Scale} Reinforcement Learning Training

Yuanqing Wang; Hao Lin; Junhao Hu; Chunyang Zhu; Quanlu Zhang; Zhen Guo; Yuchen Zhang; Xu Fu; Si Xu; Bo Dai; Zixiao Huang; Chao Yu; Boxun Li; Guohao Dai; Zhi Yang; Yu Wang

Yuanqing Wang, Peking University and Infinigence AI; Hao Lin, Junhao Hu, Chunyang Zhu, Quanlu Zhang, and Zhen Guo, Infinigence AI; Yuchen Zhang, Institute of Computing Technology, Chinese Academy of Sciences and Infinigence AI; Xu Fu and Si Xu, Infinigence AI; Bo Dai, Beihang University and Infinigence AI; Zixiao Huang, Tsinghua University and Infinigence AI; Chao Yu, Tsinghua University; Boxun Li, Infinigence AI; Guohao Dai, Shanghai Jiao Tong University and Infinigence AI; Zhi Yang, Peking University; Yu Wang, Tsinghua University

Modern reinforcement learning (RL) workloads, powering large language models, long‑horizon reasoning, and agentic systems, exhibit extreme dynamicity due to heavy‑tailed rollouts, irregular multi‑turn tool interactions, and time‑varying bottlenecks. Static resource allocations in today’s distributed RL systems leave large fractions of compute idle and prolong training. This paper presents DynaRL, the first RL system that dynamically reallocates computation, memory, and communication resources across heterogeneous RL components. DynaRL models the entire RL pipeline with a dynamic hypergraph that serves as a centralized, continuously-evolving control surface. Supported by a unified resource migration interface and context‑aware data routing, the scheduler reallocates GPUs from overprovisioned components to the current bottleneck via a combination of a multi-level scheduling algorithm and fine-grained resource migration. Comprehensive evaluation demonstrates that DynaRL improves end-to-end throughput on math-reasoning and agentic RL workloads by up to 1.98×, with negligible online scheduling overhead.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {318473,
author = {Yuanqing Wang and Hao Lin and Junhao Hu and Chunyang Zhu and Quanlu Zhang and Zhen Guo and Yuchen Zhang and Xu Fu and Si Xu and Bo Dai and Zixiao Huang and Chao Yu and Boxun Li and Guohao Dai and Zhi Yang and Yu Wang},
title = {{DynaRL}: Flexible and Dynamic Scheduling of {Large-Scale} Reinforcement Learning Training},
booktitle = {20th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26)},
year = {2026},
isbn = {978-1-939133-55-7},
address = {Seattle, WA},
pages = {847--862},
url = {https://www.usenix.org/conference/osdi26/presentation/wang-yuanqing},
publisher = {USENIX Association},
month = jul
}

Download

Wang PDF

DynaRL: Flexible and Dynamic Scheduling of Large-Scale Reinforcement Learning Training

Open Access Media