Towards High-throughput and Low-latency Billion-scale Vector Search via {CPU/GPU} Collaborative Filtering and Re-ranking

Bing Tian; Haikun Liu; Yuhang Tang; Shihai Xiao; Zhuohui Duan; Xiaofei Liao; Hai Jin; Xuecang Zhang; Junhua Zhu; Yu Zhang

Bing Tian, Haikun Liu, and Yuhang Tang, Huazhong University of Science and Technology; Shihai Xiao, Huawei Technologies Co., Ltd; Zhuohui Duan, Xiaofei Liao, and Hai Jin, Huazhong University of Science and Technology; Xuecang Zhang and Junhua Zhu, Huawei Technologies Co., Ltd; Yu Zhang, Huazhong University of Science and Technology

Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure. Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy for ANNS services. None of modern ANNS systems can address these issues simultaneously. In this paper, we present FusionANNS, a high-throughput, low-latency, cost-efficient, and high-accuracy ANNS system for billion-scale datasets using SSDs and only one entry-level GPU. The key idea of FusionANNS lies in CPU/GPU collaborative filtering and re-ranking mechanisms, which significantly reduce I/O operations across CPUs, GPU, and SSDs to break through the I/O performance bottleneck. Specifically, we propose three novel designs: (1) multi-tiered indexing to avoid data swapping between CPUs and GPU, (2) heuristic re-ranking to eliminate unnecessary I/Os and computations while guaranteeing high accuracy, and (3) redundant-aware I/O deduplication to further improve I/O efficiency. We implement FusionANNS and compare it with the state-of-the-art SSD-based ANNS system—SPANN and GPU-accelerated in-memory ANNS system—RUMMY. Experimental results show that FusionANNS achieves 1) 9.4-13.1× higher query per second (QPS) and 5.7-8.8× higher cost efficiency compared with SPANN; 2) and 2-4.9× higher QPS and 2.3-6.8× higher cost efficiency compared with RUMMY, while guaranteeing low latency and high accuracy.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

Conference attendees

BibTeX

@inproceedings {305214,
author = {Bing Tian and Haikun Liu and Yuhang Tang and Shihai Xiao and Zhuohui Duan and Xiaofei Liao and Hai Jin and Xuecang Zhang and Junhua Zhu and Yu Zhang},
title = {Towards High-throughput and Low-latency Billion-scale Vector Search via {CPU/GPU} Collaborative Filtering and Re-ranking},
booktitle = {23rd USENIX Conference on File and Storage Technologies (FAST 25)},
year = {2025},
isbn = {978-1-939133-45-8},
address = {Santa Clara, CA},
pages = {171--185},
url = {https://www.usenix.org/conference/fast25/presentation/tian-bing},
publisher = {USENIX Association},
month = feb
}

Download

Tian PDF

Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU Collaborative Filtering and Re-ranking

Open Access Media

This content is available to:

Presentation Video