Project Adam: Building an Efficient and Scalable Deep Learning Training System

Trishul Chilimbi; Yutaka Suzue; Johnson Apacible; Karthik Kalyanaraman

USENIX Conference Policies

Project Adam: Building an Efficient and Scalable Deep Learning Training System

Thursday, August 7, 2014 - 3:15pm

Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman, Microsoft Research

Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and scalability through whole system co-design that optimizes and balances workload computation and communication. We exploit asynchrony throughout the system to improve performance and show that it additionally improves the accuracy of trained models. Adam is significantly more efficient and scalable than was previously thought possible and used 30x fewer machines to train a large 2 billion connection model to 2x higher accuracy in comparable time on the ImageNet 22,000 category image classification task than the system that previously held the record for this benchmark. We also show that task accuracy improves with larger models. Our results provide compelling evidence that a distributed systems-driven approach to deep learning using current training algorithms is worth pursuing.

Trishul Chilimbi, Microsoft Research

Yutaka Suzue, Microsoft Research

Johnson Apacible, Microsoft Research

Karthik Kalyanaraman, Microsoft Research

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {186212,
author = {Trishul Chilimbi and Yutaka Suzue and Johnson Apacible and Karthik Kalyanaraman},
title = {Project Adam: Building an Efficient and Scalable Deep Learning Training System},
booktitle = {11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)},
year = {2014},
isbn = { 978-1-931971-16-4},
address = {Broomfield, CO},
pages = {571--582},
url = {https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi},
publisher = {USENIX Association},
month = oct
}

USENIX Conference Policies

Project Adam: Building an Efficient and Scalable Deep Learning Training System

Trishul Chilimbi, Microsoft Research

Yutaka Suzue, Microsoft Research

Johnson Apacible, Microsoft Research

Karthik Kalyanaraman, Microsoft Research

Open Access Media

Presentation Video

Presentation Audio

Diamond Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

sponsors

USENIX Conference Policies

Project Adam: Building an Efficient and Scalable Deep Learning Training System

Trishul Chilimbi, Microsoft Research

Yutaka Suzue, Microsoft Research

Johnson Apacible, Microsoft Research

Karthik Kalyanaraman, Microsoft Research

Open Access Media

Presentation Video

Presentation Audio

Diamond Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners