Search results

    The Origins of USAA's Postmortem of the WeekSREcon21Adam Newman
    A Retrospective: Five Years Later, Was Chaos Engineering Worth It?SREcon21Mikolaj Pawlikowski, Sachin Kamboj
    Beyond Goldilocks ReliabilitySREcon21Narayan Desai
    Panel: Unsolved Problems in SRESREcon21Kurt Andersen, Niall Murphy, Narayan Desai, Laura Nolan, Xiao Li, Sandhya Ramu
    How Our SREs Safeguard Nanosecond Performance—at Scale—in an Environment Built to FailSREcon21Jillian Hawker
    SRE "Power Words"—the Lexicon of SRE as an IndustrySREcon21Dave O'Connor
    Reliable Data Processing with Minimal ToilSREcon21Pieter Coucke, Julia Lee
    Experiments for SRESREcon21Debbie Ma
    Hard Problems We Handle in Incidents but Aren't RecognizedSREcon21John Allspaw
    Nine Questions to Build Great Infrastructure Automation PipelinesSREcon21Rob Hirschfeld
    Lessons Learned Using the Operator Pattern to Build a Kubernetes PlatformSREcon21Pavlos Ratis
    Of Mice & ElephantsSREcon21Koon Seng Lim, Sandeep Hooda
    Learning More from Complex SystemsSREcon21Andrew Hatch
    User Uptime in PracticeSREcon21Anika Mukherji
    Practical TLS Advice for Large InfrastructureSREcon21Mark Hahn, Ted Hahn
    Improving Observability in Your Observability: Simple Tips for SREsSREcon21Dan Shoop
    Nothing to Recommend It: An Interactive ML Outage FableSREcon21Todd Underwood
    Evolution of Incident Management at SlackSREcon21D. Brent Chapman
    Automating Performance Tuning with Machine LearningSREcon21Stefano Doni
    When Systems Flatline—Enhancing Incident Response with Learnings from the Medical FieldSREcon21Sarah Butt
    Hacking ML into Your OrganizationSREcon21Cathy Chen
    SRE for ML: The First 10 Years and the Next 10SREcon21Todd Underwood
    What If the Promise of AIOps Was True?SREcon21Niall Murphy
    Ceci N'est Pas un CPU LoadSREcon21Thomas Depierre
    Panel: OpMLSREcon21Vanessa Yiu, Todd Underwood, Josh Hartman, Zhangwei Xu, Nisha Talagala