Yuanqing Wang, Peking University and Infinigence AI; Hao Lin, Junhao Hu, Chunyang Zhu, Quanlu Zhang, and Zhen Guo, Infinigence AI; Yuchen Zhang, Institute of Computing Technology, Chinese Academy of Sciences and Infinigence AI; Xu Fu and Si Xu, Infinigence AI; Bo Dai, Beihang University and Infinigence AI; Zixiao Huang, Tsinghua University and Infinigence AI; Chao Yu, Tsinghua University; Boxun Li, Infinigence AI; Guohao Dai, Shanghai Jiao Tong University and Infinigence AI; Zhi Yang, Peking University; Yu Wang, Tsinghua University
Modern reinforcement learning (RL) workloads, powering large language models, long‑horizon reasoning, and agentic systems, exhibit extreme dynamicity due to heavy‑tailed rollouts, irregular multi‑turn tool interactions, and time‑varying bottlenecks. Static resource allocations in today’s distributed RL systems leave large fractions of compute idle and prolong training. This paper presents DynaRL, the first RL system that dynamically reallocates computation, memory, and communication resources across heterogeneous RL components. DynaRL models the entire RL pipeline with a dynamic hypergraph that serves as a centralized, continuously-evolving control surface. Supported by a unified resource migration interface and context‑aware data routing, the scheduler reallocates GPUs from overprovisioned components to the current bottleneck via a combination of a multi-level scheduling algorithm and fine-grained resource migration. Comprehensive evaluation demonstrates that DynaRL improves end-to-end throughput on math-reasoning and agentic RL workloads by up to 1.98×, with negligible online scheduling overhead.

