FLB: Fine-grained Load Balancing for Lossless Datacenter Networks

Jinbin Hu, Central South University, Hong Kong University of Science and Technology, Changsha University of Science and Technology; Wenxue Li, Xiangzhou Liu, Junfeng Wang, and Bowen Liu, Hong Kong University of Science and Technology; Ping Yin, Inspur; Jianxin Wang and Jiawei Huang, Central South University; Kai Chen, Hong Kong University of Science and Technology

Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) cooperating with Priority Flow Control (PFC) has been widely deployed in production datacenters to enable low latency, lossless transmission. At the same time, modern datacenters typically offer parallel transmission paths between any pair of end-hosts, underscoring the importance of load balancing. However, the well-studied load balancing mechanisms designed for lossy datacenter networks (DCNs) are ill-suited for such lossless environments.

Through extensive experiments, we are among the first to comprehensively inspect the interactions between PFC and load balancing, and uncover that existing fine-grained rerouting schemes can be counterproductive to spread the congested flows among more paths, further aggravating PFC’s head-of-line (HoL) blocking. Motivated by this, we present FLB, a Fine-grained Load Balancing scheme for lossless DCNs. At its core, FLB employs threshold-free rerouting to effectively balance traffic load and improve link utilization during normal conditions and leverages timely congested flow isolation to eliminate HoL blocking on non-congested flows when congestion occurs. We have fully implemented a FLB prototype, and our evaluation results show that FLB reduces PFC PAUSE rate by up to 96% and avoids HoL blocking, translating to up to 45% improvement in goodput over CONGA+DCQCN and 40%, 36%, 29% and 18% reduction in average flow completion time (FCT) over LetFlow+Swift, MP-RDMA, Proteus+DCQCN and LetFlow+PCN, respectively.

USENIX ATC '25 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {308456,
author = {Jinbin Hu and Wenxue Li and Xiangzhou Liu and Junfeng Wang and Bowen Liu and Ping Yin and Jianxin Wang and Jiawei Huang and Kai Chen},
title = {{FLB}: Fine-grained Load Balancing for Lossless Datacenter Networks},
booktitle = {2025 USENIX Annual Technical Conference (USENIX ATC 25)},
year = {2025},
isbn = {978-1-939133-48-9},
address = {Boston, MA},
pages = {365--380},
url = {https://www.usenix.org/conference/atc25/presentation/hu-jinbin},
publisher = {USENIX Association},
month = jul
}

Presentation Video