Testing Before You Scale & Making Friends While You Do It

Renee Lung

Wednesday, November 01, 2017 - 2:30 pm–3:00 pm

Renee Lung, PagerDuty

Your customers shouldn’t find problems before you do. When we develop software and make architectural decisions, we try to anticipate potential problems—ambiguous user interfaces, performance bottlenecks, and other edge cases. Generally we do a good job of it, but as system complexity grows, the mental models we use to plan and understand those structures don’t always adequately accommodate those complexities. So what do we do about this? We can test all the things! By using automation, we test complex scaling scenarios to validate our mental models and to identify unanticipated side-effects.

One of the issues we recently dealt with was supporting a major change in our traffic patterns. Although overall load stayed the same, the stress points produced by that load changed significantly. Major shifts like these always have the potential to disrupt our service, and in turn, disrupt our customers’ ability to keep their systems running. We had some predictions about how our system would react to the new load profile, but we wanted to validate those predictions ourselves rather than waiting for our customers to experience service degradation.

Although each engineering team had some idea of how these changes would affect the performance of their own services and had work scheduled to address those issues, I wanted to make sure we were all equipped to make informed prioritization and planning decisions. All I had to do was figure out a way to consolidate the efforts of more than 90 engineers into one focused attack on our scaling challenges.

Fortunately, I didn’t have to start from scratch: I could build on existing attitudes of collaboration, ownership, and a culture of reliability which has resulted in a rich toolset for testing resilience and scalability. This talk will outline how we used those tools, developed new ones, what we learned in the process, and the challenges of consolidating the efforts of separate teams towards a specific, common initiative.

I’m a full-stack engineer at PagerDuty, and I work on one team in a fairly large engineering department. One of the things I love most about my job is that I get to work on back end services to make sure all the wiring and plumbing is doing its job, but also I get to do some front-end development so I can see my code in action. Working at PagerDuty is my first experience with DevOps, so in addition to learning a lot about the systems that back up my code, I’ve also learned to really appreciate the work my colleagues do and the services they are responsible for. Before discovering how much I love programming, I was a graduate student, a bread baker, and a graphic designer. When I'm not lost in the endless tubes of internet, I'm playing roller derby, cross-stitching, or watching Star Trek with my cats.

Connect:

@renee_lung

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@conference {207207,
author = {Renee Lung},
title = {Testing Before You Scale \& Making Friends While You Do It},
year = {2017},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = oct
}

Testing Before You Scale & Making Friends While You Do It

Open Access Media

Presentation Video

Presentation Audio