Confessions of a Systems Engineer: Learning from My 20+ Years of Failure

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Thursday, March 26, 2020 - 9:30 am10:20 am

David Argent, Amazon

Abstract: 

There's no holy book of best practices for running large online services. We rely on what we've learned along the way, often taught to us by having things break. Failure is a great but expensive teacher, and it's usually better to learn from someone else's mistakes. I've had a long career of mistakes I've made or experienced first hand to draw on and built a list of conceptual lessons to be learned from them. This is a non-exhaustive list of things to think about when designing and running a large scale online service, rather than a prescriptive checklist.

David Argent, Amazon

With over 20 years of experience in the tech industry, and job titles ranging from Technical Writer, Systems Engineer, Program Manager, and Lead Problem Engineer (my personal favorite), I've worn more than a few hats and been victimized by more than a few badly designed online services. 19 years at Microsoft had me working on various TV-division projects, Windows Phone, Cortana, and Bing until my new adventure with Amazon, where I help run what, depending on who you talk to, is the largest no-SQL installation in the world.

BibTeX
@conference {247247,
author = {David Argent},
title = {Confessions of a Systems Engineer: Learning from My 20+ Years of Failure},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}