{SRNIC}: A Scalable Architecture for {RDMA} {NICs}

Zilong Wang; Layong Luo; Qingsong Ning; Chaoliang Zeng; Wenxue Li; Xinchen Wan; Peng Xie; Tao Feng; Ke Cheng; Xiongfei Geng; Tianhao Wang; Weicheng Ling; Kejia Huo; Pingbo An; Kui Ji; Shideng Zhang; Bin Xu; Ruiqing Feng; Tao Ding; Kai Chen; Chuanxiong Guo

Zilong Wang, Hong Kong University of Science and Technology; Layong Luo and Qingsong Ning, ByteDance; Chaoliang Zeng, Wenxue Li, and Xinchen Wan, Hong Kong University of Science and Technology; Peng Xie, Tao Feng, Ke Cheng, Xiongfei Geng, Tianhao Wang, Weicheng Ling, Kejia Huo, Pingbo An, Kui Ji, Shideng Zhang, Bin Xu, Ruiqing Feng, and Tao Ding, ByteDance; Kai Chen, Hong Kong University of Science and Technology; Chuanxiong Guo

RDMA is expected to be highly scalable: to perform well in large-scale data center networks where packet losses are inevitable (i.e., high network scalability), and to support a large number of performant connections per server (i.e., high connection scalability). Commercial RoCEv2 NICs (RNICs) fall short on scalability as they rely on a lossless, limited-scale network fabric and support only a small number of performant connections. Recent work IRN improves the network scalability by relaxing the lossless network requirement, but the connection scalability issue remains unaddressed.

In this paper, we aim to address the connection scalability challenge, while maintaining high performance and low CPU overhead as commercial RNICs, and high network scalability as IRN, by designing SRNIC, a Scalable RDMA NIC architecture. Our key insight in SRNIC is that, on-chip data structures and their memory requirements in RNICs can be minimized with careful protocol and architecture co-designs to improve connection scalability. Guided by this insight, we analyze all data structures involved in an RDMA conceptual model, and remove them as many as possible with RDMA protocol header modifications and architectural innovations, including cache-free QP scheduler and memory-free selective repeat. We implement a fully functional SRNIC prototype using FPGA. Experiments show that, SRNIC achieves 10K performant connections on chip and outperforms commercial RNICs by 18x in terms of normalized connection scalability (i.e., the number of performant connections per 1MB memory), while achieving 97 Gbps throughput and 3.3 μs latency with less than 5% CPU overhead, and maintaining high network scalability.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

Conference attendees

BibTeX

@inproceedings {285104,
author = {Zilong Wang and Layong Luo and Qingsong Ning and Chaoliang Zeng and Wenxue Li and Xinchen Wan and Peng Xie and Tao Feng and Ke Cheng and Xiongfei Geng and Tianhao Wang and Weicheng Ling and Kejia Huo and Pingbo An and Kui Ji and Shideng Zhang and Bin Xu and Ruiqing Feng and Tao Ding and Kai Chen and Chuanxiong Guo},
title = {{SRNIC}: A Scalable Architecture for {RDMA} {NICs}},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {1--14},
url = {https://www.usenix.org/conference/nsdi23/presentation/wang-zilong},
publisher = {USENIX Association},
month = apr
}

Download

Wang PDF

Wang Paper (Prepublication) PDF

View the slides

SRNIC: A Scalable Architecture for RDMA NICs

Open Access Media

This content is available to:

Presentation Video