GPREEMPT: GPU Preemptive Scheduling Made General and Efficient

Ruwen Fan and Tingxu Ren, Tsinghua University; Minhui Xie, Renmin University of China; Shiwei Gao, Jiwu Shu, and Youyou Lu, Tsinghua University

GPUs support various workloads with different peak periods and diverse service level agreements (SLA) requirements, including latency-critical tasks and best-effort tasks. Co-locating tasks with diverse SLA demands can enhance resource utilization, yet it introduces the risk of performance interference. Prior work employs preemption strategies to enforce SLAs for latency-critical tasks. These strategies can be classified into two categories: wait-based and reset-based approaches. The wait-based strategy ensures broad generality but incurs significant preemption latency. In contrast, the reset-based strategy necessitates the idempotence of preempted kernels, limiting its generality.

This paper presents GPreempt, a preemption mechanism that breaks the trade-off. GPreempt implements a timeslice-based yield mechanism to enable context-switch preemption on GPUs. To mitigate the overhead associated with context-switching, GPreempt employs a hint-based pre-preemption technique to overlap the preemption process with the essential data-preparation phase. Our evaluation demonstrates that GPreempt achieves within 40 μs low-latency preemption comparable to executing only latency-critical tasks while remaining applicable to non-idempotent workloads, where reset-based mechanisms prove inadequate.

USENIX ATC '25 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {308444,
author = {Ruwen Fan and Tingxu Ren and Minhui Xie and Shiwei Gao and Jiwu Shu and Youyou Lu},
title = {{GPREEMPT}: {GPU} Preemptive Scheduling Made General and Efficient},
booktitle = {2025 USENIX Annual Technical Conference (USENIX ATC 25)},
year = {2025},
isbn = {978-1-939133-48-9},
address = {Boston, MA},
pages = {263--272},
url = {https://www.usenix.org/conference/atc25/presentation/fan},
publisher = {USENIX Association},
month = jul
}

Presentation Video