{BGL}: {GPU-Efficient} {GNN} Training by Optimizing Graph Data {I/O} and Preprocessing

Tianfeng Liu; Yangrui Chen; Dan Li; Chuan Wu; Yibo Zhu; Jun He; Yanghua Peng; Hongzheng Chen; Hongzhi Chen; Chuanxiong Guo

Authors:

Tianfeng Liu, Tsinghua University, Zhongguancun Laboratory, ByteDance; Yangrui Chen, The University of Hong Kong, ByteDance; Dan Li, Tsinghua University, Zhongguancun Laboratory; Chuan Wu, The University of Hong Kong; Yibo Zhu, Jun He, and Yanghua Peng, ByteDance; Hongzheng Chen, ByteDance, Cornell University; Hongzhi Chen and Chuanxiong Guo, ByteDance

Abstract:

Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. The main bottlenecks are the process of preparing data for GPUs – subgraph sampling and feature retrieving. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas. First, we propose a dynamic cache engine to minimize feature retrieving traffic. By co-designing caching policy and the order of sampling, we find a sweet spot of low overhead and a high cache hit ratio. Second, we improve the graph partition algorithm to reduce cross-partition communication during subgraph sampling. Finally, careful resource isolation reduces contention between different data preprocessing stages. Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 1.9x on average.

Tianfeng Liu, Tsinghua University, Zhongguancun Laboratory, ByteDance

Yangrui Chen, The University of Hong Kong, ByteDance

Dan Li, Tsinghua University, Zhongguancun Laboratory

Chuan Wu, The University of Hong Kong

Yibo Zhu, ByteDance

Jun He, ByteDance

Yanghua Peng, ByteDance

Hongzheng Chen, ByteDance, Cornell University

Hongzhi Chen, ByteDance

Chuanxiong Guo, ByteDance

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {285052,
author = {Tianfeng Liu and Yangrui Chen and Dan Li and Chuan Wu and Yibo Zhu and Jun He and Yanghua Peng and Hongzheng Chen and Hongzhi Chen and Chuanxiong Guo},
title = {{BGL}: {GPU-Efficient} {GNN} Training by Optimizing Graph Data {I/O} and Preprocessing},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {103--118},
url = {https://www.usenix.org/conference/nsdi23/presentation/liu-tianfeng},
publisher = {USENIX Association},
month = apr
}

Download