Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-located Events
      • USENIX ATC '15
      • HotStorage '15
  • Program
    • Workshop Program
  • Activities
    • Birds-of-a-Feather Sessions
  • Sponsorship
  • Participate
    • Call for Papers
    • Instructions for Participants
  • About
    • Workshop Organizers
    • Help Promote!
    • Questions
    • Past Workshops
  • Home
  • Attend
  • Program
  • Activities
  • Sponsorship
  • Participate
  • About

sponsors

Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

help promote

HotCloud '15 button

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home ยป Optimizing Network Performance in Distributed Machine Learning
Tweet

connect with us

Optimizing Network Performance in Distributed Machine Learning

Authors: 

Luo Mai, Imperial College London; Chuntao Hong and Paolo Costa, Microsoft Research

Abstract: 

To cope with the ever growing availability of training data, there have been several proposals to scale machine learning computation beyond a single server and distribute it across a cluster. While this enables reducing the training time, the observed speed up is often limited by network bottlenecks.

To address this, we design MLNET, a host-based communication layer that aims to improve the network performance of distributed machine learning systems. This is achieved through a combination of traffic reduction techniques (to diminish network load in the core and at the edges) and traffic management (to reduce average training time). A key feature of MLNET, is its compatibility with existing hardware and software infrastructure so it can be immediately deployed.

We describe the main techniques underpinning MLNET, and show through simulation that the overall training time can be reduced by up to 78%. While preliminary, our results indicate the critical role played by the network and the benefits of introducing a new communication layer to increase the performance of distributed machine learning systems.

Luo Mai, Imperial College London

Chuntao Hong, Microsoft Research

Paolo Costa, Microsoft Research

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {190634,
author = {Luo Mai and Chuntao Hong and Paolo Costa},
title = {Optimizing Network Performance in Distributed Machine Learning},
booktitle = {7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15)},
year = {2015},
address = {Santa Clara, CA},
url = {https://www.usenix.org/conference/hotcloud15/workshop-program/presentation/mai},
publisher = {USENIX Association},
month = jul,
}
Download
Mai PDF
View the slides
  • Log in or    Register to post comments

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us