Identifying Hidden Dependencies

Monday, December 07, 2020 - 11:00 am11:45 am

Liz Fong-Jones,


You don't need to write automation or deploy on Kubernetes to gain benefits from resilience engineering! Learn how Honeycomb improved the reliability of our Zookeeper, Kafka, and stateful storage systems through terminating nodes on purpose. We'll discuss the initial manual experiments we ran, the bugs in our automatic replacement tools we uncovered, and what steps we needed to progress towards continuously running the experiments. Today, no node at Honeycomb lives longer than 12 months, and we automatically recycle nodes every week.

Liz Fong-Jones,

Liz is a developer advocate, labor and ethics organizer, and Site Reliability Engineer (SRE) with 16+ years of experience. She is an advocate at Honeycomb for the SRE and Observability communities, and previously was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights.

@inproceedings {262253,
author = {Liz Fong-Jones},
title = {Identifying Hidden Dependencies},
booktitle = {SREcon20 Americas (SREcon20 Americas)},
year = {2020},
url = {},
publisher = {USENIX Association},
month = dec

Presentation Video