Observability Is Not Analytics!

Wednesday, December 07, 2022 - 3:30 pm4:30 pm AEDT

Andrew Cowie

Abstract: 

Implementing observability was a game-changer. We dramatically reduced our time to identify problems, isolate causes, and see effects of changes.

But it's not quite as easy to retrofit as we might like to think. Brooks taught us to be wary of doing things over, but we couldn't safely make even basic changes to the existing codebase. Being able to do observability at all was a major motivation for a massive re-engineering. We'll share lessons learned as we rebuilt a large distributed system.

As we iterate the code we iterate our telemetry, too. Once you've learned something and changed the system, it's a new system; telemetry is not a continuous function! This has a drawback: you can't use observability as a substitute for business metrics. Which raises an interesting question: can you actually measure your SLOs using SLIs in a distributed system?

BibTeX
@conference {284909,
author = {Andrew Cowie},
title = {Observability Is Not Analytics!},
year = {2022},
address = {Sydney},
publisher = {USENIX Association},
month = dec
}

Presentation Video