J. Paul Reed, Release Engineering Approaches
The complexity of the socio-technical systems we engineer, operate, and exist within is staggering. Despite this, complexity remains a fact of life in software development and operations, a fact which can become easy to ignore, due to our daily interactions with and familiarity with those systems. (And, let's face it, often a strategy to cope with that comlexity!) When those systems falter or fail, we often find in the postmortems and retrospectives afterward that there were "weak signals" that portended doom, but we didn't know they were there or how to sense them.
In this talk, we'll look at what research in the safety sciences and cognitive psychology has to say about humans interacting with and operating complex socio-technical systems, including what air craft carriers have to do with Internet infrastructure operations, how resilience engineering can help us, and the use of heuristics in incident response. All of these provide insight into ways we can improve one the most advanced—and most effective—monitoring tools we have available to keep those systems running: ourselves.
SREcon18 Americas Open Access Videos Sponsored by
Indeed

author = {J. Paul Reed},
title = {Whispers in Chaos: Searching for Weak Signals in Incidents},
year = {2018},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = mar
}