What I Wish I Knew before Going On-call

Monday, October 29, 2018 - 4:00 pm5:30 pm

Chie Shu, Dorothy Jung, and Wenting Wang, Yelp

Abstract: 

Being a software engineer means owning a production system—you have many users and your company's revenue relying on your products. Firefighting a broken system is a time-sensitive and stressful part of life as an engineer. New engineers entering an on-call rotation may be overwhelmed by this responsibility. How should we act in an emergency? How can we make a system emergency-friendly? In this workshop, we will share how new and future on-call engineers can be successful by guiding participants through exercises to triage real-life engineering emergency scenarios. We will also cover how on-call engineers can share learnings within an organization to prevent future fires.

Chie Shu, Yelp

Chie Shu is a backend Software Engineer at Yelp. She has worked on improving the revenue-critical Ads data pipeline to be more resilient to system failures and designed heuristics used by executives and Product Managers to assess the financial impact. She is a leader for Yelp’s Awesome Women in Engineering support group, and has organized events to foster an inclusive community for incoming women engineers and allies. Chie holds a Bachelor’s degree in Computational Biology from Cornell University.

Dorothy Jung, Software Engineer, Yelp

Dorothy Jung is a Software Engineer with multiple years of on-call experience. At Yelp she has served as a “pushmaster”, managing and monitoring company-wide deployments to production; and as a release engineering deputy, helping to set up CI/CD pipelines within the Ads organization. She was previously at DreamWorks Animation R&D, where she worked on upgrading the studio’s build management tools. Dorothy holds a bachelor’s degree in Computer Science and French from the University of California, Berkeley.

Wenting Wang, Software Engineer, Yelp

Wenting Wang is a Software Engineer with three years of industry experience. She has been on-call for different teams at Yelp: on the BizApp backend team, where she worked closely with mobile developers and monitored mobile user traffic; and on the Ads team, where she currently develops and maintains revenue-critical real-time processing systems. Wenting received her master’s degree in Computer Science from Shanghai Jiao Tong University and was previously a doctoral candidate in Computer Science focusing on distributed systems at the University of Illinois at Urbana-Champaign.

BibTeX
@conference {221838,
author = {Chie Shu and Dorothy Jung and Wenting Wang},
title = {What I Wish I Knew before Going On-call},
year = {2018},
address = {Nashville, TN},
publisher = {USENIX Association},
month = oct
}
Who should attend: 

Early-career Software/Site Reliability/DevOps Engineers, managers looking to improve the on-call process or train new on-call engineers, startups creating an on-call rotation for the first time

Take back to work: 
  • The importance of being on-call
  • Strategies and tips when firefighting
  • Example case studies of actual incidents and resolution
  • How to write a postmortem and prevent future fires
Topics include: 

On-call strategy, Cross-team communication