Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin, School of Computer Science, Peking University
Vector query processing powers a wide range of AI applications. While GPUs are optimized for massive vector operations, today's practice relies on CPUs to process queries for large vector datasets, due to limited GPU memory.
We present RUMMY, the first GPU-accelerated vector query processing system that achieves high performance and supports large vector datasets beyond GPU memory. The core of RUMMY is a novel reordered pipelining technique that exploits the characteristics of vector query processing to efficiently pipeline data transmission from host memory to GPU memory and query processing in GPU. Specifically, it leverages three ideas: (i) cluster-based retrofitting to eliminate redundant data transmission across queries in a batch, (ii) dynamic kernel padding with cluster balancing to maximize spatial and temporal GPU utilization for GPU computation, and (iii) query-aware reordering and grouping to optimally overlap transmission and computation. We also tailor GPU memory management for vector queries to reduce GPU memory fragmentation and cache misses. We evaluate RUMMY with a variety of billion-scale benchmarking datasets. The experimental results show that RUMMY outperforms IVF-GPU with CUDA unified memory by up to 135×. Compared to the CPU-based solution (with 64 vCPUs), RUMMY (with one NVIDIA A100 GPU) achieves up to 23.1× better performance and is up to 37.7× more cost-effective.
NSDI '24 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Zili Zhang and Fangyue Liu and Gang Huang and Xuanzhe Liu and Xin Jin},
title = {Fast Vector Query Processing for Large Datasets Beyond {GPU} Memory with Reordered Pipelining},
booktitle = {21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)},
year = {2024},
isbn = {978-1-939133-39-7},
address = {Santa Clara, CA},
pages = {23--40},
url = {https://www.usenix.org/conference/nsdi24/presentation/zhang-zili-pipelining},
publisher = {USENIX Association},
month = apr
}