Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on {GPU} Clusters

Hao Zhang; Zeyu Zheng; Shizhen Xu; Wei Dai; Qirong Ho; Xiaodan Liang; Zhiting Hu; Jinliang Wei; Pengtao Xie; Eric P. Xing

Authors:

Hao Zhang, Carnegie Mellon University; Zeyu Zheng, Petuum Inc.; Shizhen Xu and Wei Dai, Carnegie Mellon University; Qirong Ho, Petuum Inc.; Xiaodan Liang, Zhiting Hu, Jinliang Wei, and Pengtao Xie, Carnegie Mellon University; Eric P. Xing, Petuum Inc.

Abstract:

Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter synchronization over the network, because the high throughput of GPUs allows more data batches to be processed per unit time than CPUs, leading to more frequent network synchronization. We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Poseidon exploits the layered model structures in DL programs to overlap communication and computation, reducing bursty network communication. Moreover, Poseidon uses a hybrid communication scheme that optimizes the number of bytes required to synchronize each layer, according to layer properties and the number of machines. We show that Poseidon is applicable to different DL frameworks by plugging Poseidon into Caffe and TensorFlow. We show that Poseidon enables Caffe and TensorFlow to achieve 15.5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification. Moreover, Poseidon-enabled TensorFlow achieves 31.5x speed-up with 32 single-GPU machines on Inception-V3, a 50% improvement over the open-source TensorFlow (20x speed-up).

Hao Zhang, Carnegie Mellon University

Zeyu Zheng, Petuum Inc.

Shizhen Xu, Carnegie Mellon University

Wei Dai, Carnegie Mellon University

Qirong Ho, Petuum Inc.

Xiaodan Liang, Carnegie Mellon University

Zhiting Hu, Carnegie Mellon University

Jinliang Wei, Carnegie Mellon University

Pengtao Xie, Carnegie Mellon University

Eric P. Xing, Petuum Inc.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {203269,
author = {Hao Zhang and Zeyu Zheng and Shizhen Xu and Wei Dai and Qirong Ho and Xiaodan Liang and Zhiting Hu and Jinliang Wei and Pengtao Xie and Eric P. Xing},
title = {Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on {GPU} Clusters},
booktitle = {2017 USENIX Annual Technical Conference (USENIX ATC 17)},
year = {2017},
isbn = {978-1-931971-38-6},
address = {Santa Clara, CA},
pages = {181--193},
url = {https://www.usenix.org/conference/atc17/technical-sessions/presentation/zhang},
publisher = {USENIX Association},
month = jul
}

Download

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters