Delivering Business Impact through Culture Change. How We Saved Millions by Celebrating Failure through Learnings

Due to the evolving Coronavirus/COVID-19 situation, SREcon20 Americas West has been rescheduled to June 2–4, 2020.
More information is available here.

Tuesday, March 24, 2020 - 2:50 pm3:30 pm

Aniket Kulkarni, PayPal

Abstract: 

As a leader in the Fintech space, we've come far in the reliability journey. We implemented Service Level Objectives based on Failed Customer Interactions and drove systematic improvements through tooling and automation to get to an availability of 99.98%. The last mile to get to 99.99% required scaling the SRE culture across an organization of over 7000 engineers. Through strategic plan that involved participation from dozens of developers, SREs, product managers and influencers, we encouraged and rewarded a culture that is comfortable with discussing and analyzing failure in a blameless way. Now every single one of our 500+ P0/P1 issues annually goes through a grassroots driven analysis where learnings are published to the entire company. The rich dataset from this is mined to identify themes which drive reliability investment plans. The 0.01% increase in availability translates to millions of $ in bottom line annual revenue for the company. In this talk we would like to cover the strategy, initiatives, incentives, setbacks, learnings and iterations we went through to get us here.

BibTeX
@conference {247281,
author = {Aniket Kulkarni},
title = {Delivering Business Impact through Culture Change. How We Saved Millions by Celebrating Failure through Learnings},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = mar,
}