Unlock High-Frequency Deployments without Blowing Up Prometheus

Thursday, March 26, 2026 - 1:55 pm2:15 pm

Ganesh Vernekar, Reddit

High-frequency deployments often create a hidden bottleneck in Kubernetes: time-series churn. As pods come and go, Prometheus accumulates "stale" series in memory, leading to dangerous spikes and OOM crashes.

This talk introduces stale-series compaction, a feature that proactively flushes stale data from memory to disk and protects your Prometheus during high series churn. Beyond the design, I will share critical learnings from production experiments at Reddit, including what to expect in terms of resource usage and what this feature is not for. Attendees will leave with a clear playbook for enabling this feature to unblock high-frequency rollouts without destabilizing their monitoring infrastructure.

Ganesh is a Staff Engineer at Reddit working on observability infrastructure and has been contributing to Prometheus for 8 years. He is also a maintainer of the Prometheus TSDB and member of the Prometheus team. In his previous stint at Grafana Labs he has also worked on Mimir, Cortex, and Grafana.

BibTeX
@conference {316290,
author = {Ganesh Vernekar},
title = {Unlock {High-Frequency} Deployments without Blowing Up Prometheus},
year = {2026},
address = {Seattle, WA},
publisher = {USENIX Association},
month = mar
}

Presentation Video