Search results

    SRE for ML: The First 10 Years and the Next 10SREcon21Todd Underwood
    What If the Promise of AIOps Was True?SREcon21Niall Murphy
    Ceci N'est Pas un CPU LoadSREcon21Thomas Depierre
    Panel: OpMLSREcon21Vanessa Yiu, Todd Underwood, Josh Hartman, Zhangwei Xu, Nisha Talagala
    Scaling for a Pandemic: How We Keep Ahead of Demand for Google Meet during COVID-19SREcon21Samantha Schaevitz
    Designing an Autonomous Workbench for Data Science on AWSSREcon21Dipen Chawla
    You've Lost That Process Feeling: Some Lessons from Resilience EngineeringSREcon21David D. Woods, Laura Nolan
    Demystifying Machine Learning in Production: Reasoning about a Large-Scale ML PlatformSREcon21Mary McGlohon
    Leveraging ML to Detect Application HotSpots [@scale, of Course!]SREcon21Sanket Patel
    How We Built Out Our SRE Department to Support over 100 Million Users for the World's 3rd Biggest Mobile MarketplaceSREcon21Sinéad O'Reilly
    Model Monitoring: Detecting and Analyzing Data IssuesSREcon21Dmitri Melikyan
    Horizontal Data Freshness Monitoring in Complex PipelinesSREcon21Alexey Skorikov
    Microservices above the Cloud—Designing the International Space Station for ReliabilitySREcon21Robert Barron
    Grand National 2021: Managing Extreme Online Demand at William HillSREcon21Matthew Berridge, Josh Allenby
    DevOps Ten Years After: Review of a Failure with John Allspaw and Paul HammondSREcon21Thomas Depierre, John Allspaw, Paul Hammond
    From 15,000 Database Connections to under 100—A Tech Debt TaleSREcon21Sunny Beatteay
    Optimizing Cost and Performance with arm64SREcon21Liz Fong-Jones
    Let's Bring System Dynamics Back to CS!SREcon21Marianne Bellotti
    Trustworthy Graceful Degradation: Fault Tolerance across Service BoundariesSREcon21Daniel Rodgers-Pryor
    A Principled Approach to Monitoring Streaming Data Infrastructure at ScaleSREcon21Eric Schow, Praveen Yedidi
    Watching the Watchers: Generating Absent Alerts for PrometheusSREcon21Nick Spain
    Cache Strategies with Best PracticesSREcon21Tao Cai
    SLX: An Extended SLO Framework to Expedite Incident RecoverySREcon21Qian Ding, Xuan Zhang
    MySQL and InnoDB Performance for the Rest of UsSREcon21Shaun O'Keefe
    How LinkedIn Performs Maintenances at ScaleSREcon21Akash Vacher