David Mah, Dropbox
At Dropbox, we’ve worked incredibly hard to build infrastructure that we are confident in trusting. A major aspect of our confidence comes from the verification of our data at rest, which gives us signal that our data will be properly usable when requests actually come in.
In this talk, we’ll break down the thinking about how to design and build a consistency checker system. We’ll start with the actual needs/goals of such a system, then follow with the sub-components of the system. We’ll include both distributed system design AND how to design your alert escalation workflow to be as simple as possible for human operators.
Attendees are expected to leave the session understanding how they could build consistency checkers for their own systems. This includes:
- Do you even need a consistency checker?
- What independent components need to exist?
- What is a good alerting + triaging workflow?
- What is involved in an auto-remediation mechanism for constraint failures
David Mah, Dropbox
David Mah is a Site Reliability Engineer at Dropbox who has built several monitoring mechanisms across Dropbox’s block storage and server file system infrastructure. He is also the author of Dropbox’s auto-remediation infrastructure.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {David Mah},
title = {Data Checking at Dropbox},
year = {2017},
publisher = {USENIX Association},
month = may
}