HeteCCL: Synthesizing Near-Optimal Collective Communication Schedules for Heterogeneous GPU Clusters

Chenyang Hei, Fuliang Li, and Jiayi Li, Northeastern University; Jiamin Cao, Alibaba Cloud; Chengxi Gao, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; Xiuzhu Sha, Tongrui Liu, and Dengke Zhang, Northeastern University; Ennan Zhai, Alibaba Cloud; Xingwei Wang, Northeastern University

Training large language models demands massive computing and networking resources. However, existing clusters often face shortages of homogeneous resources and vendor lock-in, forcing the use of heterogeneous hardware, which makes synchronizing training across nodes highly challenging. Current solutions to cluster heterogeneity suffer from low collective communication efficiency, with suboptimal scheduling and slow algorithm synthesis. We present HeteCCL, a unified method for generating near-optimal collective communication schedules on heterogeneous clusters. HeteCCL models the cluster topology and link bandwidth in detail, quantizes data chunks at the schedule-step level, and formulates the scheduling problem as a maximum parallel transfer problem on a weighted directed graph. To accelerate synthesis, HeteCCL encodes bandwidth and routing constraints as SMT formulas and applies counterexample-guided inductive synthesis to refine constraints and prune the search space iteratively. Experiments on heterogeneous testbeds, each consisting of 32 H20 and V100 GPUs, show that HeteCCL outperforms NCCL, TACCL, and TE-CCL, achieving up to 2.8×, 4.4×, and 2.6× higher bandwidth. It also accelerates synthesis by up to 2 orders of magnitude compared to state-of-the-art efforts, and improves end-to-end training efficiency by 23%–37%.

NSDI '26 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {316626,
author = {Chenyang Hei and Jiayi Li and Jiamin Cao and Chengxi Gao and Xiuzhu Sha and Tongrui Liu and Dengke Zhang and Ennan Zhai and Xingwei Wang},
title = {{HeteCCL}: Synthesizing {Near-Optimal} Collective Communication Schedules for Heterogeneous {GPU} Clusters},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
isbn = {978-1-939133-54-0},
address = {Renton, WA},
pages = {2533--2551},
url = {https://www.usenix.org/conference/nsdi26/presentation/hei},
publisher = {USENIX Association},
month = may
}

Presentation Video