Search results

    When Systems Flatline—Enhancing Incident Response with Learnings from the Medical FieldSREcon21Sarah Butt
    Leveraging ML to Detect Application HotSpots [@scale, of Course!]SREcon21Sanket Patel
    Ceci N'est Pas un CPU LoadSREcon21Thomas Depierre
    Scaling for a Pandemic: How We Keep Ahead of Demand for Google Meet during COVID-19SREcon21Samantha Schaevitz
    Designing an Autonomous Workbench for Data Science on AWSSREcon21Dipen Chawla
    You've Lost That Process Feeling: Some Lessons from Resilience EngineeringSREcon21David D. Woods, Laura Nolan
    Model Monitoring: Detecting and Analyzing Data IssuesSREcon21Dmitri Melikyan
    Panel: OpMLSREcon21Vanessa Yiu, Todd Underwood, Josh Hartman, Zhangwei Xu, Nisha Talagala
    What If the Promise of AIOps Was True?SREcon21Niall Murphy
    Demystifying Machine Learning in Production: Reasoning about a Large-Scale ML PlatformSREcon21Mary McGlohon
    How We Built Out Our SRE Department to Support over 100 Million Users for the World's 3rd Biggest Mobile MarketplaceSREcon21Sinéad O'Reilly
    Horizontal Data Freshness Monitoring in Complex PipelinesSREcon21Alexey Skorikov
    Microservices above the Cloud—Designing the International Space Station for ReliabilitySREcon21Robert Barron
    Grand National 2021: Managing Extreme Online Demand at William HillSREcon21Matthew Berridge, Josh Allenby
    DevOps Ten Years After: Review of a Failure with John Allspaw and Paul HammondSREcon21Thomas Depierre, John Allspaw, Paul Hammond
    Optimizing Cost and Performance with arm64SREcon21Liz Fong-Jones
    Cache Strategies with Best PracticesSREcon21Tao Cai
    How LinkedIn Performs Maintenances at ScaleSREcon21Akash Vacher
    From 15,000 Database Connections to under 100—A Tech Debt TaleSREcon21Sunny Beatteay
    Let's Bring System Dynamics Back to CS!SREcon21Marianne Bellotti
    Trustworthy Graceful Degradation: Fault Tolerance across Service BoundariesSREcon21Daniel Rodgers-Pryor
    A Principled Approach to Monitoring Streaming Data Infrastructure at ScaleSREcon21Eric Schow, Praveen Yedidi
    Watching the Watchers: Generating Absent Alerts for PrometheusSREcon21Nick Spain
    Rethinking the SDLCSREcon21Emily Freeman
    SLX: An Extended SLO Framework to Expedite Incident RecoverySREcon21Qian Ding, Xuan Zhang