Callum Donald, BlackRock
Our decentralised operational model effectively supported our system for years, but we recognised it might not scale with increasing complexities and reliability expectations. To explore a solution, we initiated a deliberate trial embedding an SRE within a key trading systems engineering team. Working directly alongside the engineers enabled us to drive operational accountability, align closely with organisational OKRs, and build the trust necessary for meaningful reliability improvements from within.
We navigated unique constraints; traditional tools like error budgets and gradual SLO-based alerting were incompatible with finances zero tolerance for delays. Instead, we reshaped alerting around practical telemetry and revitalised incident retrospectives, enhancing their effectiveness and drawing out actionable insights. Over 12 months, this approach rebuilt trust, significantly reduced incidents and required no additional headcount.
This session shares practical insights for embedding SRE in your organisation, demonstrating how to adapt standard practices, deliver immediate value, and foster a sustainable reliability culture.

Callum is a Senior Site Reliability Engineer at BlackRock. He was the first embedded SRE within the company's Aladdin platform, where he helped pioneer an approach that has since been scaled across dozens of product and platform teams. His work focuses on building reliability practices that are practical, sustainable and tailored to the realities of financial systems. Callum is a first-time speaker at SREcon.

author = {Callum Donald},
title = {Lessons from an Asset {Manager{\textquoteright}s} First Embedded {SRE}},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}
