Autonomous Automation: How Cloudflare Handles Server Diagnostics and Recovery at Scale

Wednesday, June 14, 2023 - 2:55 pm3:20 pm

Jet Mariscal, Cloudflare

Abstract: 

This talk describes the difference between automation and autonomy, and shares the thought process of how one can transform automation into an autonomous automated system, and includes a synopsis of a system that autonomously handles server diagnostics and recovery at scale at Cloudflare, having fleets of servers in data centers all over the globe, and how it was designed -- highlighting how a few specific principles including some of the essential SRE principles played a crucial role to its success.

This presentation, which is applicable to anyone regardless of size and industry, will help attendees looking to implement, improve, or transform existing automations to become autonomous automations that will drive value and lead to increased efficiency, productivity, and competitiveness in the long run.

Jet Mariscal, Cloudflare

Jet was an SRE and is currently working as the Infrastructure Engineering Tech Lead at Cloudflare. Previously, an SRE at Teralytics working on Big Data systems across several data centers around the world. Jet specializes in architecting and implementing large-scale fault-tolerant and high-availability distributed systems. Over his career, he’s built various systems and authored internal tools for automation in multiple programming languages.

BibTeX
@conference {288271,
author = {Jet Mariscal},
title = {Autonomous Automation: How Cloudflare Handles Server Diagnostics and Recovery at Scale},
year = {2023},
address = {Singapore},
publisher = {USENIX Association},
month = jun
}

Presentation Video