Reducing MTTR and False Escalations: Event Correlation at LinkedIn

Website Maintenance Alert

Due to scheduled maintenance on Wednesday, October 16, from 10:30 am to 4:30 pm Pacific Daylight Time (UTC -7), parts of the USENIX website (e.g., conference registration, user account changes) may not be available. We apologize for the inconvenience.

If you are trying to register for LISA19, please complete your registration before or after this time period.

Tuesday, March 14, 2017 - 9:55am10:50am

Michael Kehoe, LinkedIn

Abstract: 

LinkedIn’s production stack is made up of over 900 applications and over 2200 internal API’s. With any given application having many interconnected pieces, it is difficult to escalate to the right person in a timely manner.

In order to combat this, LinkedIn built an Event Correlation Engine that monitors service health and maps dependencies between services to correctly escalate to the SRE’s who own the unhealthy service. 

We’ll discuss the approach we used in building a correlation engine and how it has been used at LinkedIn to reduce incident impact and provide better quality of life to LinkedIn’s oncall engineers.

Michael Kehoe, LinkedIn

Michael Kehoe, Staff Site Reliability Engineer in the Production-SRE team, joined the LinkedIn operations team as a new college graduate in January 2014. Prior to that, Michael studied Engineering at the University of Queensland (Australia) where he majored in Electrical Engineering. During his time studying, he interned at NASA Ames Research Center working on the PhoneSat project.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {201815,
author = {Michael Kehoe},
title = {Reducing {MTTR} and False Escalations: Event Correlation at LinkedIn},
year = {2017},
address = {San Francisco, CA},
publisher = {{USENIX} Association},
month = mar,
}

Presentation Video

Download Video

Presentation Audio