Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • OSDI '14 Home
  • Symposium Organizers
  • At a Glance
  • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
  • Technical Sessions
  • Co-Located Workshops
  • Activities
    • Birds-of-a-Feather Sessions
    • Poster Sessions
  • Sponsorship
  • Students and Grants
  • Co-located Workshops
  • Questions?
  • Help Promote!
  • For Participants
  • Call for Papers
  • Past Symposia

sponsors

Diamond Sponsor
Diamond Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
General Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home ยป Project Adam: Building an Efficient and Scalable Deep Learning Training System
Tweet

connect with us

http://twitter.com/usenix
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

Project Adam: Building an Efficient and Scalable Deep Learning Training System

Thursday, August 7, 2014 - 3:15pm
Authors: 

Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman, Microsoft Research

Abstract: 

Large deep neural network models have recently demonstrated state-of-the-art accuracy on hard visual recognition tasks. Unfortunately such models are extremely time consuming to train and require large amount of compute cycles. We describe the design and implementation of a distributed system called Adam comprised of commodity server machines to train such models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks. Adam achieves high efficiency and scalability through whole system co-design that optimizes and balances workload computation and communication. We exploit asynchrony throughout the system to improve performance and show that it additionally improves the accuracy of trained models. Adam is significantly more efficient and scalable than was previously thought possible and used 30x fewer machines to train a large 2 billion connection model to 2x higher accuracy in comparable time on the ImageNet 22,000 category image classification task than the system that previously held the record for this benchmark. We also show that task accuracy improves with larger models. Our results provide compelling evidence that a distributed systems-driven approach to deep learning using current training algorithms is worth pursuing.

Trishul Chilimbi, Microsoft Research

Yutaka Suzue, Microsoft Research

Johnson Apacible, Microsoft Research

Karthik Kalyanaraman, Microsoft Research

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Chilimbi PDF

Presentation Video

Presentation Audio

MP3 Download OGG Download

Download Audio

  • Log in or    Register to post comments

Diamond Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us