{RLinf}: Flexible and Efficient {Large-Scale} Reinforcement Learning via {Macro-to-Micro} Flow Transformation

Chao Yu; Yuanqing Wang; Zhen Guo; Hao Lin; Si Xu; Hongzhi Zang; Quanlu Zhang; Yongji Wu; Chunyang Zhu; Junhao Hu; Zixiao Huang; Mingjie Wei; Yuqing Xie; Ke Yang; Bo Dai; Zhexuan Xu; Jiakun Du; Xiangyuan Wang; Xu Fu; Letong Shi; Zhihao Liu; Kang Chen; Weilin Liu; Gang Liu; Boxun Li; Jianlei Yang; Zhi Yang; Guohao Dai; Yu Wang

Chao Yu, Tsinghua University; Yuanqing Wang, Infinigence AI and Peking University; Zhen Guo, Hao Lin, and Si Xu, Infinigence AI; Hongzhi Zang, Tsinghua University; Quanlu Zhang, Infinigence AI; Yongji Wu, University of California, Berkeley; Chunyang Zhu and Junhao Hu, Infinigence AI; Zixiao Huang, Tsinghua University and Infinigence AI; Mingjie Wei, Zhongguancun Academy; Yuqing Xie, Tsinghua University; Ke Yang, Zhongguancun Academy; Bo Dai, Beihang University and Infinigence AI; Zhexuan Xu and Jiakun Du, Tsinghua University; Xiangyuan Wang, Peking University and Infinigence AI; Xu Fu and Letong Shi, Infinigence AI; Zhihao Liu, Zhongguancun Academy; Kang Chen, Peking University and Zhongguancun Academy; Weilin Liu, Infinigence AI; Gang Liu, Tsinghua University; Boxun Li, Infinigence AI; Jianlei Yang, Beihang University; Zhi Yang, Peking University; Guohao Dai, Shanghai Jiao Tong University and Infinigence AI; Yu Wang, Tsinghua University

Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to low hardware utilization and slow training on existing systems. In this paper, we present RLinf, a high-performance RL training system based on our key observation that the major roadblock to efficient RL training lies in system flexibility. To maximize flexibility and efficiency, RLinf is built atop a novel RL system design paradigm called macro-to-micro flow transformation (M2Flow), which automatically breaks down high-level, easy-to-compose RL workflows at both the temporal and spatial dimensions, and recomposes them into optimized execution flows. Supported by Rinf worker’s adaptive communication capability, we devise context switching and elastic pipelining to realize M2Flow transformation, and a profiling-guided scheduling policy to generate optimal execution plans. Extensive evaluations on both reasoning RL and embodied RL tasks demonstrate that Rinf consistently outperforms state-of-the-art systems, achieving 1.07×∼2.43× speedup in end-to-end training throughput.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {318471,
author = {Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Jiakun Du and Xiangyuan Wang and Xu Fu and Letong Shi and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
title = {{RLinf}: Flexible and Efficient {Large-Scale} Reinforcement Learning via {Macro-to-Micro} Flow Transformation},
booktitle = {20th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26)},
year = {2026},
isbn = {978-1-939133-55-7},
address = {Seattle, WA},
pages = {829--846},
url = {https://www.usenix.org/conference/osdi26/presentation/yu-chao},
publisher = {USENIX Association},
month = jul
}

Download

Yu PDF

RLinf: Flexible and Efficient Large-Scale Reinforcement Learning via Macro-to-Micro Flow Transformation

Open Access Media