When Cloud Storage Meets RDMA

Authors: 

Yixiao Gao, Nanjing University and Alibaba Group; Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, Fei Feng, Yan Zhuang, Fan Liu, Pan Liu, Xingkui Liu, Zhongjie Wu, Junping Wu, and Zheng Cao, Alibaba Group; Chen Tian, Nanjing University; Jinbo Wu, Jiaji Zhu, Haiyong Wang, Dennis Cai, and Jiesheng Wu, Alibaba Group

Abstract: 

A production-level cloud storage system must be high performing and readily available. It should also meet a ServiceLevel Agreement (SLA). The rapid advancement in storage media has left networking lagging behind, resulting in a major performance bottleneck for new cloud storage generations. Remote Direct Memory Access (RDMA) running on lossless fabrics can potentially overcome this bottleneck. In this paper, we present our experience in introducing RDMA into the storage networks of Pangu, a cloud storage system developed by Alibaba. Since its introduction in 2009, it has proven to be crucial for Alibaba’s core businesses. In addition to the performance, availability, and SLA requirements, the deployment planning of Pangu at the production scale should consider storage volume and hardware costs. We present an RDMAenabled Pangu system that exhibits superior performance, with the availability and SLA standards matching those of traditional TCP-backed versions. RDMA-enabled Pangu has been demonstrated to successfully serve numerous online mission-critical services across four years, including several important shopping festivals.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

BibTeX
@inproceedings {262036,
title = {When Cloud Storage Meets {RDMA}},
booktitle = {18th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 21)},
year = {2021},
url = {https://www.usenix.org/conference/nsdi21/presentation/gao},
publisher = {{USENIX} Association},
month = apr,
}
Gao Paper (Prepublication) PDF