{ZeRO-Offload}: Democratizing {Billion-Scale} Model Training

Jie Ren; Samyam Rajbhandari; Reza Yazdani Aminabadi; Olatunji Ruwase; Shuangyan Yang; Minjia Zhang; Dong Li; Yuxiong He

Authors:

Jie Ren, UC Merced; Samyam Rajbhandari, Reza Yazdani Aminabadi, and Olatunji Ruwase, Microsoft; Shuangyan Yang, UC Merced; Minjia Zhang, Microsoft; Dong Li, UC Merced; Yuxiong He, Microsoft

Abstract:

Large-scale model training has been a playing ground for a limited few requiring complex model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload changes the large model training landscape by making large model training accessible to nearly everyone. It can train models with over 13 billion parameters on a single GPU, a 10x increase in size compared to popular framework such as PyTorch, and it does so without requiring any model change from the data scientists or sacrificing computational efficiency.

ZeRO-Offload enables large model training by offloading data and compute to CPU. To preserve compute efficiency, it is designed to minimize the data movement to/from GPU, and reduce CPU compute time while maximizing memory savings on GPU. As a result, ZeRO-Offload can achieve 40 TFlops/GPU on a single NVIDIA V100 GPU for 10B parameter model compared to 30TF using PyTorch alone for a 1.4B parameter model, the largest that can be trained without running out of memory. ZeRO-Offload is also designed to scale on multiple-GPUs when available, offering near linear speedup on up to 128 GPUs. Additionally, it can work together with model parallelism to train models with over 70 billion parameters on a single DGX-2 box, a 4.5x increase in model size compared to using model parallelism alone.

By combining compute and memory efficiency with ease-of-use, ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU.

Jie Ren, UC Merced

Samyam Rajbhandari, Microsoft

Reza Yazdani Aminabadi, Microsoft

Olatunji Ruwase, Microsoft

Shuangyan Yang, UC Merced

Minjia Zhang, Microsoft

Dong Li, UC Merced

Yuxiong He, Microsoft

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {273920,
author = {Jie Ren and Samyam Rajbhandari and Reza Yazdani Aminabadi and Olatunji Ruwase and Shuangyan Yang and Minjia Zhang and Dong Li and Yuxiong He},
title = {{ZeRO-Offload}: Democratizing {Billion-Scale} Model Training},
booktitle = {2021 USENIX Annual Technical Conference (USENIX ATC 21)},
year = {2021},
isbn = {978-1-939133-23-6},
pages = {551--564},
url = {https://www.usenix.org/conference/atc21/presentation/ren-jie},
publisher = {USENIX Association},
month = jul
}

Download

Ren PDF

View the slides

ZeRO-Offload: Democratizing Billion-Scale Model Training