FAERY: An FPGA-accelerated Embedding-based Retrieval System

Authors: 

Chaoliang Zeng, Hong Kong University of Science and Technology; Layong Luo, Qingsong Ning, Yaodong Han, and Yuhang Jiang, ByteDance; Ding Tang, Zilong Wang, and Kai Chen, Hong Kong University of Science and Technology; Chuanxiong Guo, ByteDance

Abstract: 

Embedding-based retrieval (EBR) is widely used in recommendation systems to retrieve thousands of relevant candidates from a large corpus with millions or more items. A good EBR system needs to achieve both high throughput and low latency, as high throughput usually means cost saving and low latency improves user experience. Unfortunately, the performances of existing CPU- and GPU-based EBR are far from optimal due to their inherent architectural limitations.

In this paper, we first study how an ideal yet practical EBR system works, and then design FAERY , an FPGA-accelerated EBR, which achieves the optimal performance of the practically ideal EBR system. FAERY is composed of three key components: It uses a high bandwidth HBM for memory bandwidth-intensive corpus scanning, a data parallelism approach for similarity calculation, and a pipeline-based approach for K-selection. To further reduce hardware resources, FAERY introduces a filter to early drop the non-Top-K items. Experiments show that the degraded FAERY with the same memory bandwidth of GPU still achieves 1.21×-12.27× lower latency and up to 4.29× higher throughput under a latency target of 10 ms than GPU-based EBR.

OSDI '22 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {280856,
author = {Chaoliang Zeng and Layong Luo and Qingsong Ning and Yaodong Han and Yuhang Jiang and Ding Tang and Zilong Wang and Kai Chen and Chuanxiong Guo},
title = {{FAERY}: An {FPGA-accelerated} Embedding-based Retrieval System},
booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
year = {2022},
isbn = {978-1-939133-28-1},
address = {Carlsbad, CA},
pages = {841--856},
url = {https://www.usenix.org/conference/osdi22/presentation/zeng},
publisher = {USENIX Association},
month = jul
}

Presentation Video