Scaling Distributed Machine Learning with the Parameter Server

Mu Li; David G. Andersen; Jun Woo Park; Alexander J. Smola; Amr Ahmed; Vanja Josifovski; James Long; Eugene J. Shekita; Bor-Yiing Su

USENIX ATC '15 button

Thursday, August 7, 2014 - 3:30pm

Authors:

Mu Li, Carnegie Mellon University and Baidu; David G. Andersen and Jun Woo Park, Carnegie Mellon University; Alexander J. Smola, Carnegie Mellon University and Google, Inc.; Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su, Google, Inc.

Abstract:

We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance.

To demonstrate the scalability of the proposed framework, we show experimental results on petabytes of real data with billions of examples and parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching.

Mu Li, Carnegie Mellon University and Baidu

David G. Andersen, Carnegie Mellon University

Jun Woo Park, Carnegie Mellon University

Alexander J. Smola, Carnegie Mellon University and Google, Inc.

Amr Ahmed, Google, Inc.

Vanja Josifovski, Google, Inc.

James Long, Google, Inc.

Eugene J. Shekita, Google, Inc.

Bor-Yiing Su, Google, Inc.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {186214,
author = {Mu Li and David G. Andersen and Jun Woo Park and Alexander J. Smola and Amr Ahmed and Vanja Josifovski and James Long and Eugene J. Shekita and Bor-Yiing Su},
title = {Scaling Distributed Machine Learning with the Parameter Server},
booktitle = {11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)},
year = {2014},
isbn = { 978-1-931971-16-4},
address = {Broomfield, CO},
pages = {583--598},
url = {https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu},
publisher = {USENIX Association},
month = oct
}

Mu Li, Carnegie Mellon University and Baidu

David G. Andersen, Carnegie Mellon University

Jun Woo Park, Carnegie Mellon University

Alexander J. Smola, Carnegie Mellon University and Google, Inc.

Amr Ahmed, Google, Inc.

Vanja Josifovski, Google, Inc.

James Long, Google, Inc.

Eugene J. Shekita, Google, Inc.

Bor-Yiing Su, Google, Inc.

Open Access Media

Presentation Video

Presentation Audio

connect with us

Open Access Media

Presentation Video

Presentation Audio