RobustRL: Role-based Fault Tolerance System for RL Post-Training

Zhenqian Chen and Baoquan Zhong, Zhejiang University; Xiang Li, unaffiliated; Qing Dai, Xinkui Zhao, and Miao Ye, Zhejiang University; Cheng Ren, unaffiliated; Lufei Zhang, State Key Laboratory of Mathematical Engineering and Advanced Computing, China; Jianwei Yin, Zhejiang University