How To Take Prometheus Planet Scale: Massively Large Scale Metrics Installations

Thursday, March 23, 2023 - 1:55 pm2:40 pm

Vijay Samuel and Nick Pordash, eBay

Abstract: 

Observability at eBay has been on an exponential growth curve. What was a low 2M/sec ingest rate of time series in 2017 is now roughly 40M/sec with active time series close to 3 billion. Our current cortex inspired architecture of Prometheus builds sharding and clustering on top of the Prometheus TSDB. It is relatively simple to shard/replicate tenants of data in centralized clusters. However, large clusters with growing cardinality become less useful as query latencies degrade considerably. In 2020, Google published a paper on its time series database Monarch which is dubbed as a planet scale TSDB. The paper gave us some useful hints on how we could potentially decentralize our installation and go fully planet scale.

What started off as a humble prototype to federate queries to TSDBs deployed in Sydney, Amsterdam and the US from a centralized query instance, now is a living breathing entity that allows us to deploy our TSDBs anywhere in the world using simple Kubernetes operators, GitOps and intelligence on top of the Prometheus TSDB.

This talk focuses on:

  • the development of field hint indices to fingerprint time series and use the same for pointed query fanout.
  • functional query push down on top of Prometheus storage
  • the struggles of managing a planet scale deployment and using Gitops to mitigate pains
  • other lessons learned

Vijay Samuel, eBay

Vijay Samuel works with eBay's observability platform as its architect. During his time at eBay Vijay has transformed eBay's observability platform into a cloud native offering that is primarily built on top of open source technologies. He loves to code in Go and play video games.

Nick Pordash, eBay

Nick Pordash is the lead engineer of the Observability platform at eBay. He solves the hard problems of scaling the platform.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {286305,
author = {Vijay Samuel and Nick Pordash},
title = {How To Take Prometheus Planet Scale: Massively Large Scale Metrics Installations},
year = {2023},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}

Presentation Video