The Bitter and the Sweet of Running a Planet-Scale Build & CI Stack at Google

Tuesday, 7 October, 2025 - 11:0011:45

Tomasz Koczorowski, Google Germany

This talk offers a deep dive into managing Google's planet-scale Build & CI stack from an SRE perspective. We'll explore the complexities of supporting over 100,000 monthly users and massive workloads, maintaining a 98% cache hit rate across diverse computing environments. Discover resource management strategies for doubling year-over-year growth, the application of the Pareto principle, and critical caching layers (local, remote, P2P). We'll cover output storage managing hundreds of petabytes daily through deduplication and compression, and trace a user's build journey from desktop to production. Gain insights into optimizing the build stack and maintaining a 24/7 service with a small SRE team through strategic planning, monitoring, and collaboration. Finally, we'll discuss availability risks, standardization, and the pros and cons of a monolithic Build & CI stack.

Tomasz is an Engineering Manager at Google, leading BATS SRE team operating one of the largest Build & CI stacks on the planet. Used to work with Sun Starcat and IBM Regatta UNIX systems back in mid 2000’s. Worked in tech in multiple industries including telco, rolling stock, data and gaming before joining Google. Based in Munich, fan of airshows and krampus.

BibTeX
@conference {311812,
author = {Tomasz Koczorowski},
title = {The Bitter and the Sweet of Running a {Planet-Scale} Build \& {CI} Stack at Google},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}

Presentation Video