Prepping for the Worst—What Your On-Call Team Should Know

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Thursday, March 26, 2020 - 12:05 pm12:40 pm

Amiya Adwitiya and Biju Chacko, Squadcast Inc

Abstract: 

When systems break, it is always an urgent and stressful situation. And this only gets worse as your team scales, and your stack evolves. Given these facts, it is essential to have up-to-date documentation and a high-level onboarding plan to keep everyone aligned.

This talk will address the following subjects:

  1. How do we train and on-board new engineers effectively for being on call?
  2. How can we build an atmosphere of trust and confidence between various stakeholders?
  3. How to pick the right folks to be on-call for a specific service?
  4. What tools and processes are popular and which ones are advisable in specific contexts?

In the end, the audience should be able to:

  1. Create an onboarding procedure for their on-call team
  2. Apply best practices to their own processes

Amiya Adwitiya, Squadcast Inc

Amiya Adwitiya is the Founder and CEO of Squadcast, an end-to-end incident response platform built around SRE best practices for tech teams to avoid unplanned downtime. He previously worked with Freshworks, a customer engagement software company and Accel Partners, a venture capital firm.

Biju Chacko, Squadcast Inc

Biju Chacko is a 20+ year veteran of Unix system operations. In the past, he has been a prominent open-source advocate, helped popularize Linux in India, and was a core committer of the Xfce Desktop. Most recently, he helped lead an operations team that managed a fleet of 55,000 physical servers.

BibTeX
@conference {247241,
author = {Amiya Adwitiya and Biju Chacko},
title = {Prepping for the Worst{\textemdash}What Your On-Call Team Should Know},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}