BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?

Authors: 

Suhas Jayaram Subramanya and Harsha Vardhan Simhadri, Microsoft Research India; Srajan Garg, IIT Bombay; Anil Kag and Venkatesh Balasubramanian, Microsoft Research India

Abstract: 

Many large scale machine learning training and inference tasks are memory-bound rather than compute-bound. That is, on large data sets, the working set of these algorithms does not fit in memory for jobs that could run overnight on a few multi-core processors. This often forces an expensive redesign of the algorithm to distributed platforms such as parameter servers and Spark.

We propose an inexpensive and efficient alternative based on the observation that many ML tasks admit algorithms that can be programmed with linear algebra subroutines. A library that supports BLAS and sparseBLAS interface on large SSD-resident matrices can enable multi-threaded code to scale to industrial scale data sets on a single workstation.

We demonstrate that not only can such a library provide near in-memory performance for BLAS, but can also be used to write implementations of complex algorithms such as eigensolvers that outperform in-memory (ARPACK) and distributed (Spark) counterparts.

Existing multi-threaded in-memory code can link to our library with minor changes and scale to hundreds of Gigabytes of training or inference data at near in-memory processing speeds. We demonstrate this with two industrial scale use cases arising in ranking and relevance pipelines: training large scale topic models and inference for extreme multi-label learning.

This suggests that our approach could be an efficient alternative to expensive big-data compute systems for scaling up structurally complex machine learning tasks.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {226006,
author = {Suhas Jayaram Subramanya and Harsha Vardhan Simhadri and Srajan Garg and Anil Kag and Venkatesh Balasubramanian},
title = {BLAS-on-flash: An Efficient Alternative for Large Scale {ML} Training and Inference?},
booktitle = {16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19)},
year = {2019},
isbn = {978-1-931971-49-2},
address = {Boston, MA},
pages = {469--484},
url = {https://www.usenix.org/conference/nsdi19/presentation/subramanya},
publisher = {{USENIX} Association},
}