User Uptime in Practice

Note: Presentation times are in Coordinated Universal Time (UTC).

Thursday, 14 October, 2021 - 04:0004:30

Anika Mukherji, Pinterest


As SREs, ultimately user experience is our most important metric. At Pinterest, like many other companies, we were using success rate as a proxy for the quality of our service to our users. However, success rate is fraught with many issues when it comes to representing product quality, which made it difficult for us to understand, measure, and react to changes in "Pinner" experience.

We landed upon User Uptime as our solution. This is a "time-based" metric that presents many advantages over a "count-based" metric like success rate. During this talk, we will discuss how Pinterest went about implementing such a metric—in terms of technological stack and design decisions—and what we learned in the process, about both our product and our users.

Anika Mukherji, Pinterest

Anika is a senior SRE at Pinterest's HQ in San Francisco. She is embedded in several teams, including the API platform team, the web platform team, the traffic team, and the continuous delivery team. She focuses on making the core "Pinner" experience reliable and measurable, with a special emphasis on safe production changes. She also has experience in the performance realm and has worked on improving the speed of the Pinterest product.

SREcon21 Open Access Sponsored by Indeed

@conference {276753,
author = {Anika Mukherji},
title = {User Uptime in Practice},
year = {2021},
publisher = {USENIX Association},
month = oct

Presentation Video