STRADS-AP: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model

Authors: 

Jin Kyu Kim and Abutalib Aghayev, Carnegie Mellon University; Garth A. Gibson, Carnegie Mellon University, Vector Institute, University of Toronto; Eric P. Xing, Petuum Inc, Carnegie Mellon University

Abstract: 

It is a daunting task for a data scientist to convert sequential code for a Machine Learning (ML) model, published by an ML researcher, to a distributed framework that runs on a cluster and operates on massive datasets. The process of fitting the sequential code to an appropriate programming model and data abstractions determined by the framework of choice requires significant engineering and cognitive effort. Furthermore, inherent constraints of frameworks sometimes lead to inefficient implementations, delivering suboptimal performance.

We show that it is possible to achieve automatic and efficient distributed parallelization of familiar sequential ML code by making a few mechanical changes to it while hiding the details of concurrency control, data partitioning, task parallelization, and fault-tolerance. To this end, we design and implement a new distributed ML framework, STRADS-Automatic Parallelization (AP), and demonstrate that it simplifies distributed ML programming significantly, while outperforming a popular data-parallel framework with a non-familiar programming model, and achieving performance comparable to an ML-specialized framework.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {234924,
author = {Jin Kyu Kim and Abutalib Aghayev and Garth A. Gibson and Eric P. Xing},
title = {{STRADS-AP}: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model},
booktitle = {2019 USENIX Annual Technical Conference (USENIX ATC 19)},
year = {2019},
isbn = {978-1-939133-03-8},
address = {Renton, WA},
pages = {207--222},
url = {https://www.usenix.org/conference/atc19/presentation/kim-jin},
publisher = {USENIX Association},
month = jul
}

Presentation Video