Hardware & Datacenter Reliability

Thursday, 9 October, 2025 - 13:3014:30
Panos Christeas and John Looney, Crusoe.ai

The hardware space has not traditionally had a lot of attention from SRE, but that's changing. SREs in the datacenter automation space are working with hardware and firmware teams in their vendors to improve that layer, and make it easier to run for the many years after hardware leaves the factory.
Come and ask questions of John Looney and Panos Christeas, as well as other SREs who have decades of experience working with bare metal, new product introductions, and what the old hyperscalers can teach the new clouds that are springing up.

John Looney has been a full stack SRE for 20 years, working at every layer from hardware design to 100 million RPC/s revenue booking services. The last year has been a spent building the fastest AI training clusters possible, and learning they are very different to typical datacenters.

BibTeX
@conference {315434,
author = {Panos Christeas and John Looney},
title = {Hardware \& Datacenter Reliability},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}