Gamifying Reliability Excellence—The Service Score Card

Thursday, 2017, August 31 - 16:0016:30

Daniel Lawrence, LinkedIn

Abstract: 

What makes a “good” service is a moving target. Technologies and requirements change over time. It can be impossible to ensure that none of your services have been left behind. The Service ScoreCard approach is to have a small check for each service initiative we have, this could be anything measurable; deployment frequency, the oncall team all have phone; ensuring the latest version of the JVM. The Service ScoreCard, gives each service a grade from 'F' to 'A+', based on passing or failing the list of checks. As soon as anyone see the service grade’s slipping everyone rallies to improve the grades. We can then set up rules based on the grades, “Only B and above services can deploy 24 / 7”, “moratorium on services without an A+” or “No SRE support until the services below C grade”.

Daniel Lawrence, LinkedIn

Daniel will fix anything with python, even if it's not broken. He is an Aussie on loan to LinkedIn in the USA as an SRE, focusing on looking after the jobs and recruiting services. When he is not working on tricky problems for LinkedIn, he plays _a lot_ of video games.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Audio

BibTeX
@conference {205550,
author = {Daniel Lawrence},
title = {Gamifying Reliability Excellence{\textemdash}The Service Score Card},
year = {2017},
address = {Dublin},
publisher = {{USENIX} Association},
}