Dorylus: Affordable, Scalable, and Accurate {GNN} Training with Distributed {CPU} Servers and Serverless Threads

John Thorpe; Yifan Qiao; Jonathan Eyolfson; Shen Teng; Guanzhou Hu; Zhihao Jia; Jinliang Wei; Keval Vora; Ravi Netravali; Miryung Kim; Guoqing Harry Xu

Authors:

John Thorpe, Yifan Qiao, Jonathan Eyolfson, and Shen Teng, UCLA; Guanzhou Hu, UCLA and University of Wisconsin, Madison; Zhihao Jia, CMU; Jinliang Wei, Google Brain; Keval Vora, Simon Fraser; Ravi Netravali, Princeton University; Miryung Kim and Guoqing Harry Xu, UCLA

Abstract:

A graph neural network (GNN) enables deep learning on structured graph data. There are two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which are expensive to purchase and maintain, and 2) limited memory on GPUs cannot scale to today's billion-edge graphs. This paper presents Dorylus: a distributed system for training GNNs. Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost.

The key insight guiding our design is computation separation. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. With the help of thousands of Lambda threads, Dorylus scales GNN training to billion-edge graphs. Currently, for large graphs, CPU servers offer the best performance-per-dollar over GPU servers. Just using Lambdas on top of CPU servers offers up to 2.75✕ more performance-per-dollar than training only with CPU servers. Concretely, Dorylus is 1.22✕ faster and 4.83✕ cheaper than GPU servers for massive sparse graphs. Dorylus is up to 3.8✕ faster and 10.7✕ cheaper compared to existing sampling-based systems.

John Thorpe, UCLA

Yifan Qiao, UCLA

Jonathan Eyolfson, UCLA

Shen Teng, UCLA

Guanzhou Hu, UCLA and University of Wisconsin, Madison

Zhihao Jia, Carnegie Mellon University

Jinliang Wei, Google Brain

Keval Vora, Simon Fraser

Ravi Netravali, Princeton University

Miryung Kim, UCLA

Guoqing Harry Xu, UCLA

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {273743,
author = {John Thorpe and Yifan Qiao and Jonathan Eyolfson and Shen Teng and Guanzhou Hu and Zhihao Jia and Jinliang Wei and Keval Vora and Ravi Netravali and Miryung Kim and Guoqing Harry Xu},
title = {Dorylus: Affordable, Scalable, and Accurate {GNN} Training with Distributed {CPU} Servers and Serverless Threads},
booktitle = {15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)},
year = {2021},
isbn = {978-1-939133-22-9},
pages = {495--514},
url = {https://www.usenix.org/conference/osdi21/presentation/thorpe},
publisher = {USENIX Association},
month = jul
}

Download