Case Study: Implementing SLOs for a New Service

Monday, March 25, 2019 - 10:30 am11:00 am

Arnaud Lawson, Squarespace

Abstract: 

Implementing service level objectives (SLOs) effectively is a hard task, especially for a service which not only is new within your engineering and product organizations but also encompasses both a request-driven and a storage subsystem.

In this talk, I will discuss our experience defining and measuring service level indicators (SLIs) and objectives for our Ceph Object Storage service. I will describe our approach in specifying service level indicators plus the tradeoffs and implementation decisions we made when it came to measuring various types of SLIs, including availability, latency, and durability.

I will also share the lessons learned and benefits gained from our implementation. You will understand why SLOs are crucial for site reliability engineers and service users and will be given some tips on how to implement them for either a request-driven or a storage system.

Arnaud Lawson, Squarespace

Arnaud is a Senior Site Reliability Engineer at Squarespace in New York, where—among other things—he has led the productionization of Ceph as a storage backend used by many Squarespace services.

BibTeX
@conference {229501,
author = {Arnaud Lawson},
title = {Case Study: Implementing SLOs for a New Service},
year = {2019},
address = {Brooklyn, NY},
publisher = {{USENIX} Association},
}