{WLB-LLM}: {Workload-Balanced} 4D Parallelism for Large Language Model Training

Zheng Wang; Anna Cai; Xinfeng Xie; Zaifeng Pan; Yue Guan; Weiwei Chu; Jie Wang; Shikai Li; Jianyu Huang; Chris Cai; Yuchen Hao; Yufei Ding

Zheng Wang, University of California, San Diego, and Meta; Anna Cai and Xinfeng Xie, Meta; Zaifeng Pan and Yue Guan, University of California, San Diego; Weiwei Chu, Jie Wang, Shikai Li, Jianyu Huang, Chris Cai, and Yuchen Hao, Meta; Yufei Ding, University of California, San Diego, and Meta

In this work, we present WLB-LLM, a WorkLoad-Balanced 4D Parallelism for Large Language Model Training. We first thoroughly analyze the workload imbalance issue in LLM training and identify two primary sources of imbalance at the pipeline parallelism and context parallelism levels. Then, to address the imbalance issue, at the pipeline parallelism level, WLB-LLM incorporates a workload-aware variable-length document packing method to balance the computation and communication workload across micro-batches. Additionally, at the context parallelism level, WLB-LLM introduces a novel fine-grained per-document sharding strategy, ensuring each worker within a context parallelism group has an identical workload. Comprehensive experiments under different model scales demonstrate that WLB-LLM significantly mitigates the workload imbalance during 4D parallelism LLM training and achieves an average speedup of 1.23× when applying WLB-LLM in our internal LLM training framework.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {308780,
author = {Zheng Wang and Anna Cai and Xinfeng Xie and Zaifeng Pan and Yue Guan and Weiwei Chu and Jie Wang and Shikai Li and Jianyu Huang and Chris Cai and Yuchen Hao and Yufei Ding},
title = {{WLB-LLM}: {Workload-Balanced} 4D Parallelism for Large Language Model Training},
booktitle = {19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)},
year = {2025},
isbn = {978-1-939133-47-2},
address = {Boston, MA},
pages = {785--801},
url = {https://www.usenix.org/conference/osdi25/presentation/wang-zheng},
publisher = {USENIX Association},
month = jul
}

Download

Wang PDF

View the slides

WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training

Open Access Media

Presentation Video