Confessions of a Systems Engineer: Learning from My 20+ Years of Failure

Tuesday, December 08, 2020 - 1:25 pm2:05 pm

David Argent, Amazon


There's no holy book of best practices for running large online services. We rely on what we've learned along the way, often taught to us by having things break. Failure is a great but expensive teacher, and it's usually better to learn from someone else's mistakes. I've had a long career of mistakes I've made or experienced first hand to draw on, and built a list of conceptual lessons to be learned from them. This is a non-exhaustive list of things to think about when designing and running a large scale online service, rather than a prescriptive checklist.

David Argent, Amazon

With over 20 years of experience in the tech industry, and job titles ranging from Technical Writer, to Systems Engineer, to Program Manager, to Lead Problem Engineer (my personal favorite), I've worn more than a few hats and been victimized by more than a few badly designed online services. 19 years at Microsoft had me working on various TV-division projects, Windows Phone, Cortana, and Bing until my new adventure with Amazon, where I help run what, depending on who you talk to, is the largest no-SQL installation in the world.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {262222,
author = {David Argent},
title = {Confessions of a Systems Engineer: Learning from My 20+ Years of Failure},
booktitle = {SREcon20 Americas (SREcon20 Americas)},
year = {2020},
url = {},
publisher = {USENIX Association},
month = dec

Presentation Video