Search results

    Panel: OpMLSREcon21Vanessa Yiu, Todd Underwood, Josh Hartman, Zhangwei Xu, Nisha Talagala
    Scaling for a Pandemic: How We Keep Ahead of Demand for Google Meet during COVID-19SREcon21Samantha Schaevitz
    Designing an Autonomous Workbench for Data Science on AWSSREcon21Dipen Chawla
    You've Lost That Process Feeling: Some Lessons from Resilience EngineeringSREcon21David D. Woods, Laura Nolan
    Demystifying Machine Learning in Production: Reasoning about a Large-Scale ML PlatformSREcon21Mary McGlohon
    Leveraging ML to Detect Application HotSpots [@scale, of Course!]SREcon21Sanket Patel
    How We Built Out Our SRE Department to Support over 100 Million Users for the World's 3rd Biggest Mobile MarketplaceSREcon21Sinéad O'Reilly
    Model Monitoring: Detecting and Analyzing Data IssuesSREcon21Dmitri Melikyan
    Horizontal Data Freshness Monitoring in Complex PipelinesSREcon21Alexey Skorikov
    Microservices above the Cloud—Designing the International Space Station for ReliabilitySREcon21Robert Barron
    Grand National 2021: Managing Extreme Online Demand at William HillSREcon21Matthew Berridge, Josh Allenby
    DevOps Ten Years After: Review of a Failure with John Allspaw and Paul HammondSREcon21Thomas Depierre, John Allspaw, Paul Hammond
    From 15,000 Database Connections to under 100—A Tech Debt TaleSREcon21Sunny Beatteay
    Optimizing Cost and Performance with arm64SREcon21Liz Fong-Jones
    Let's Bring System Dynamics Back to CS!SREcon21Marianne Bellotti
    Trustworthy Graceful Degradation: Fault Tolerance across Service BoundariesSREcon21Daniel Rodgers-Pryor
    A Principled Approach to Monitoring Streaming Data Infrastructure at ScaleSREcon21Eric Schow, Praveen Yedidi
    Watching the Watchers: Generating Absent Alerts for PrometheusSREcon21Nick Spain
    Cache Strategies with Best PracticesSREcon21Tao Cai
    SLX: An Extended SLO Framework to Expedite Incident RecoverySREcon21Qian Ding, Xuan Zhang
    MySQL and InnoDB Performance for the Rest of UsSREcon21Shaun O'Keefe
    How LinkedIn Performs Maintenances at ScaleSREcon21Akash Vacher
    Need for SPEED: Site Performance Efficiency, Evaluation and DecisionSREcon21Kingsum Chow, Zhihao Chang
    Elephant in the Blameless War Room—AccountabilitySREcon21Christina Tan, Emily Arnott
    Take Me Down to the Paradise City Where the Metric Is Green and Traces Are PrettySREcon21Ricardo Ferreira