MARS: Disaggregated Multi-Task Agentic RL Training at Scale

Wei Gao, Yuheng Zhao, and Tianyuan Wu, Hong Kong University of Science and Technology; Shaopan Xiong and Weixun Wang, Alibaba Group; Dakai An and Lunxi Cao, Hong Kong University of Science and Technology; Dilxat Muhtar, Zichen Liu, Haizhou Zhao, Ju Huang, Siran Yang, Wenbo Su, Jiamang Wang, Lin Qu, and Bo Zheng, Alibaba Group; Yongbin Li, Tongyi Lab, Alibaba; Wei Wang, Hong Kong University of Science and Technology