Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent

Qizhen Weng; Lingyun Yang; Yinghao Yu; Wei Wang; Xiaochuan Tang; Guodong Yang; Liping Zhang

Qizhen Weng and Lingyun Yang, Hong Kong University of Science and Technology; Yinghao Yu, Alibaba Group and Hong Kong University of Science and Technology; Wei Wang, Hong Kong University of Science and Technology; Xiaochuan Tang, Guodong Yang, and Liping Zhang, Alibaba Group

Large tech companies are piling up a massive number of GPUs in their server fleets to run diverse machine learning (ML) workloads. However, these expensive devices often suffer from significant underutilization. To tackle this issue, GPU sharing techniques have been developed to enable multiple ML tasks to run on a single GPU. Nevertheless, our analysis of Alibaba production traces reveals that allocating partial GPUs can result in severe GPU fragmentation in large clusters, leaving hundreds of GPUs unable to be allocated. Existing resource packing algorithms fall short in addressing this problem, as GPU sharing mandates a new scheduling formulation beyond the classic bin packing.

In this paper, we propose a novel measure of fragmentation to statistically quantify the extent of GPU fragmentation caused by different sources. Building upon this measure, we propose to schedule GPU-sharing workloads towards the direction of the steepest descent of fragmentation, an approach we call Fragmentation Gradient Descent (FGD). Intuitively, FGD packs tasks to minimize the growth of GPU fragmentation, thereby achieving the maximum GPU allocation rate. We have implemented FGD as a new scheduler in Kubernetes and evaluated its performance using production traces on an emulated cluster comprising more than 6,200 GPUs. Compared to the existing packing-based schedulers, FGD reduces unallocated GPUs by up to 49%, resulting in the utilization of additional 290 GPUs.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

This content is available to:

Conference attendees

BibTeX

@inproceedings {288717,
author = {Qizhen Weng and Lingyun Yang and Yinghao Yu and Wei Wang and Xiaochuan Tang and Guodong Yang and Liping Zhang},
title = {Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent},
booktitle = {2023 USENIX Annual Technical Conference (USENIX ATC 23)},
year = {2023},
isbn = {978-1-939133-35-9},
address = {Boston, MA},
pages = {995--1008},
url = {https://www.usenix.org/conference/atc23/presentation/weng},
publisher = {USENIX Association},
month = jul
}

Download

Weng PDF

View the slides

Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent

Open Access Media

This content is available to:

Presentation Video