Andrew Medworth, Google UK
Leader election is a popular design pattern for distributed systems managing critical state. But despite its simple and innocent appearance, hidden dangers lurk. A reliable leader-elected service requires more than just a proven consensus implementation and a superficial strategy for handling the lower availability that comes with strong consistency.
This talk will present the theory and practice of fencing, illustrated by a serious outage of a Google service which we thought was doing everything right. It will also discuss some challenges with the operation of leader elected services, and some alternatives to leader election.

Andrew Medworth is a Staff SRE at Google. He manages the London half of Traffic Interconnect SRE, a team responsible for hybrid connectivity and NAT for Google Cloud Networking. He is currently Tech Lead for Torpedo, which is the system responsible for transport-layer egress from Borg, Google's cluster manager.

author = {Andrew Medworth},
title = {Leader Election: Pitfalls and Alternatives},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}
