Tiara: A Scalable and Efficient Hardware Acceleration Architecture for Stateful Layer-4 Load Balancing

Authors: 

Chaoliang Zeng, Hong Kong University of Science and Technology; Layong Luo and Teng Zhang, ByteDance; Zilong Wang, Hong Kong University of Science and Technology; Luyang Li, ICT/CAS; Wenchen Han, Peking University; Nan Chen, Lebing Wan, Lichao Liu, Zhipeng Ding, Xiongfei Geng, Tao Feng, and Feng Ning, ByteDance; Kai Chen, Hong Kong University of Science and Technology; Chuanxiong Guo, ByteDance

Abstract: 

Stateful layer-4 load balancers (LB) are deployed at datacenter boundaries to distribute Internet traffic to backend real servers. To steer terabits per second traffic, traditional software LBs scale out with many expensive servers. Recent switch-accelerated LBs scale up efficiently, but fail to offload a massive number of concurrent flows into limited on-chip SRAMs.

This paper presents Tiara, a hardware architecture for stateful layer-4 LBs that aims to support a high traffic rate (> 1 Tbps), a large number of concurrent flows (> 10M), and many new connections per second (> 1M), without any assumption on traffic patterns. The three-tier architecture of Tiara makes the best use of heterogeneous hardware for stateful LBs, including a programmable switch and FPGAs for the fast path and x86 servers for the slow path. The core idea of Tiara is to divide the LB fast path into a memory-intensive task (real server selection) and a throughput-intensive task (packet encap/decap), and map them into the most suitable hardware, respectively (i.e., map real server selection into FPGA with large high-bandwidth memory (HBM) and packet encap/decap into a high-throughput programmable switch). We have implemented a fully functional Tiara prototype, and experiments show that Tiara can achieve extremely high performance (1.6 Tbps throughput, 80M concurrent flows, 1.8M new connections per second, and less than 4 us latency in the fast path) in a holistic server equipped with 8 FPGA cards, with high cost, energy, and space efficiency.

NSDI '22 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {276954,
author = {Chaoliang Zeng and Layong Luo and Teng Zhang and Zilong Wang and Luyang Li and Wenchen Han and Nan Chen and Lebing Wan and Lichao Liu and Zhipeng Ding and Xiongfei Geng and Tao Feng and Feng Ning and Kai Chen and Chuanxiong Guo},
title = {Tiara: A Scalable and Efficient Hardware Acceleration Architecture for Stateful Layer-4 Load Balancing},
booktitle = {19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)},
year = {2022},
isbn = {978-1-939133-27-4},
address = {Renton, WA},
pages = {1345--1358},
url = {https://www.usenix.org/conference/nsdi22/presentation/zeng},
publisher = {USENIX Association},
month = apr
}

Presentation Video