Auto-Instrumentation for GPU Performance using eBPF

Wednesday, 8 October, 2025 - 11:5012:10

Nikola Grcevski, Grafana Labs

This talk explores the potential of leveraging eBPF to capture CUDA calls made to GPUs, including kernel launches and memory allocations. Data from these probes can be used to export Prometheus metrics, facilitating a detailed analysis on kernel launch patterns and associated memory usage. This approach offers significant benefits as eBPF imposes minimal overhead and requires no intrusive instrumentation. By leveraging eBPF, the instrumentation can be enabled (or disabled) while the GPU application is running, for example AI/ML training monitoring/profiling can be enabled after the training has started.

Nikola Grcevski has worked as a software engineer for more than 20 years, mostly in the field of compilers, managed runtimes and performance optimization. Most recently he's working on low level application instrumentation with eBPF at Grafana Labs and he's a maintaner of the OpenTelemetry eBPF Instrumentation project.

BibTeX
@conference {311850,
author = {Nikola Grcevski},
title = {{Auto-Instrumentation} for {GPU} Performance using {eBPF}},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}

Presentation Video