KRCORE: A Microsecond-scale RDMA Control Plane for Elastic Computing

Authors: 

Xingda Wei, Shanghai Jiao Tong University, Shanghai AI Laboratory; Fangming Lu, Shanghai Jiao Tong University; Rong Chen, Shanghai Jiao Tong University, Shanghai AI Laboratory; Haibo Chen, Shanghai Jiao Tong University

Abstract: 

We present KRCORE, an RDMA library with a microsecond-scale control plane on commodity RDMA hardware for elastic computing. KRCORE can establish a full-fledged RDMA connection within 10μs (hundreds or thousands of times faster than verbs), while only maintaining a (small) fixed-sized connection metadata at each node, regardless of the cluster scale. The key ideas include virtualizing pre-initialized kernel-space RDMA connections instead of creating one from scratch, and retrofitting advanced RDMA dynamic connected transport with static transport for both low connection overhead and high networking speed. Under load spikes, KRCORE can shorten the worker bootstrap time of an existing disaggregated key-value store (namely RACE Hashing) by 83%. In serverless computing (namely Fn), KRCORE can also reduce the latency for transferring data through RDMA by 99%.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {280746,
author = {Xingda Wei and Fangming Lu and Rong Chen and Haibo Chen},
title = {{KRCORE}: A Microsecond-scale {RDMA} Control Plane for Elastic Computing},
booktitle = {2022 USENIX Annual Technical Conference (USENIX ATC 22)},
year = {2022},
isbn = {978-1-939133-29-42},
address = {Carlsbad, CA},
pages = {121--136},
url = {https://www.usenix.org/conference/atc22/presentation/wei},
publisher = {USENIX Association},
month = jul,
}

Presentation Video