Flor: An Open High Performance RDMA Framework Over Heterogeneous RNICs

Authors: 

Qiang Li, Alibaba Group; Yixiao Gao and Xiaoliang Wang, Nanjing University; Haonan Qiu, Alibaba Group; Yanfang Le, AMD; Derui Liu, Alibaba Group; Qiao Xiang, Xiamen University; Fei Feng, Peng Zhang, Bo Li, Jianbo Dong, Lingbo Tang, Hongqiang Harry Liu, Shaozong Liu, Weijie Li, Rui Miao, Yaohui Wu, Zhiwu Wu, Chao Han, Lei Yan, Zheng Cao, and Zhongjie Wu, Alibaba Group; Chen Tian and Guihai Chen, Nanjing University; Dennis Cai, Jinbo Wu, Jiaji Zhu, and Jiesheng Wu, Alibaba Group; Jiwu Shu, Xiamen University

Abstract: 

Datacenter applications have been increasingly applying RDMA for the ultra-low latency and low CPU overhead. However, RDMA-capable NICs (RNICs) of different vendors and different generations from the same vendors do not cooperate well, which causes bandwidth imbalance in the production network. Our observation of the heterogeneous RNICs is that though the data path functions of these RNICs follow the same RoCEv2 specifications, their control path functions are vendor and version specific. To this end, we propose Flor, an open framework that provides a flexible control plane in software and a unified hardware plane by adopting heterogeneous RNICs. The hardware plane requires no changes of current specifications. The software plane can run in NPU of RNICs, DPUs and host CPUs, following which we build up strengthen reliable transport over the large-scale lossy Ethernet. We implemented and evaluated Flor in both testbed and production clusters over Intel E180, Mellanox CX-4 and CX-5 and Broadcom RNICs. Experiments show that Flor achieves comparable performance to vanilla RDMA in many scenarios including 1/4096 packet loss, 6000:1 incast, and large-scale cross-pod communication. Flor mitigates the performance gap of CX-4 and CX-5 RNICs from 24.3% to 1.3% when they are deployed together without PFC dependency.

OSDI '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {288578,
author = {Qiang Li and Yixiao Gao and Xiaoliang Wang and Haonan Qiu and Yanfang Le and Derui Liu and Qiao Xiang and Fei Feng and Peng Zhang and Bo Li and Jianbo Dong and Lingbo Tang and Hongqiang Harry Liu and Shaozong Liu and Weijie Li and Rui Miao and Yaohui Wu and Zhiwu Wu and Chao Han and Lei Yan and Zheng Cao and Zhongjie Wu and Chen Tian and Guihai Chen and Dennis Cai and Jinbo Wu and Jiaji Zhu and Jiesheng Wu and Jiwu Shu},
title = {Flor: An Open High Performance {RDMA} Framework Over Heterogeneous {RNICs}},
booktitle = {17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)},
year = {2023},
isbn = {978-1-939133-34-2},
address = {Boston, MA},
pages = {931--948},
url = {https://www.usenix.org/conference/osdi23/presentation/li-qiang},
publisher = {USENIX Association},
month = jul
}

Presentation Video