Hostping: Diagnosing Intra-host Network Bottlenecks in RDMA Servers

Authors: 

Kefei Liu, BUPT; Zhuo Jiang, ByteDance Inc.; Jiao Zhang, BUPT and Purple Mountain Laboratories; Haoran Wei, BUPT and ByteDance Inc.; Xiaolong Zhong, BUPT; Lizhuang Tan, ByteDance Inc.; Tian Pan and Tao Huang, BUPT and Purple Mountain Laboratories

Abstract: 

Intra-host networking was considered robust in the RDMA (Remote Direct Memory Access) network and received little attention. However, as the RNIC (RDMA NIC) line rate increases rapidly to multi-hundred gigabits, the intra-host network becomes a potential performance bottleneck for network applications. Intra-host network bottlenecks may result in degraded intra-host bandwidth and increased intra-host latency, which can severely impact network performance. However, when intra-host bottlenecks occur, they can hardly be noticed due to the lack of a monitoring system. Furthermore, existing bottleneck diagnosis mechanisms fail to diagnose intra-host bottlenecks efficiently. In this paper, we analyze the symptom of intra-host bottlenecks based on our longterm troubleshooting experience and propose Hostping, the first bottleneck monitoring and diagnosis system dedicated to intra-host networks. The core idea of Hostping is conducting loopback tests between RNICs and endpoints within the host to measure intra-host latency and bandwidth. Hostping not only discovers intra-host bottlenecks we already knew but also reveals six bottlenecks we did not notice before.

NSDI '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {285185,
author = {Kefei Liu and Zhuo Jiang and Jiao Zhang and Haoran Wei and Xiaolong Zhong and Lizhuang Tan and Tian Pan and Tao Huang},
title = {Hostping: Diagnosing Intra-host Network Bottlenecks in {RDMA} Servers},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {15--29},
url = {https://www.usenix.org/conference/nsdi23/presentation/liu-kefei},
publisher = {USENIX Association},
month = apr
}
Liu Paper (Prepublication) PDF

Presentation Video