Case Study: Implementing SLOs for a New Service

Monday, March 25, 2019 - 10:30 am11:00 am

Arnaud Lawson, Squarespace

Abstract: 

Implementing service level objectives (SLOs) effectively is a hard task, especially for a service which not only is new within your engineering and product organizations but also encompasses both a request-driven and a storage subsystem.

In this talk, I will discuss our experience defining and measuring service level indicators (SLIs) and objectives for our Ceph Object Storage service. I will describe our approach in specifying service level indicators plus the tradeoffs and implementation decisions we made when it came to measuring various types of SLIs, including availability, latency, and durability.

I will also share the lessons learned and benefits gained from our implementation. You will understand why SLOs are crucial for site reliability engineers and service users and will be given some tips on how to implement them for either a request-driven or a storage system.

Arnaud Lawson, Squarespace

Arnaud is a Senior Site Reliability Engineer at Squarespace in New York, where—among other things—he has led the productionization of Ceph as a storage backend used by many Squarespace services.

SREcon19 Americas Open Access Videos Sponsored by
Salesforce

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {229501,
author = {Arnaud Lawson},
title = {Case Study: Implementing {SLOs} for a New Service},
year = {2019},
address = {Brooklyn, NY},
publisher = {USENIX Association},
month = mar
}

Presentation Video