Over the course of my career, I've had the opportunity to work with a number of organizations on their operational maturity. After doing "systems archeology" a number of times, starting at new organizations, I began recognizing certain signature "smells" that indicated that there was something that could be improved, and often had a pretty good idea how those situations came to be.
Things like the volume of pager alerts can be indicators of poor signal to noise ratios, or overworked infrastructure, or broken architectures. Things like elaborate change control can be signs of inadequate testing, or lack of automation (as if a review by people unfamiliar with the changes makes it safer). Recovery mechanisms that are never tested are never going to actually work in the case that they are needed except in the most trivial of cases.
There are many such examples with single points of failure, competing change mechanisms, scaling challenges, outsourcing of manual automation (not a typo), badly scoped runbooks, immature monitoring, multi-generational monitoring systems, and more, that are signs that we can do better.
In this talk, we'll talk about some fun that was had over the years, maturing different infrastructures, learning from failure and success, and how we can take lessons from "mistakes were made" scenarios to increase our performance, lower our MTTR, and help those in the systems engineering organization love their job.
Dave Mangot is the author of Mastering DevOps from Packt Publishing. He was previously the head of Site Reliability Engineering (SRE) for the SolarWinds Cloud companies and is an accomplished systems engineer with over 20 years' experience. He has held positions in various organizations, from small startups to multinational corporations such as Cable & Wireless and Salesforce, from systems administrator to architect. He has led transformations at multiple companies in operational maturity and in a deeper adherence to DevOps thinking. He enjoys time spent as a mentor, speaker, and student to so many talented members of the community.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.