Building Centralized Caching Infrastructure at Scale

James Won

Friday, June 14, 2019 - 11:00 am–12:00 pm

James Won, LinkedIn

Caching is integral to any large-scale web operation. LinkedIn formed a dedicated caching team in 2017 and since then we have built out automation and infrastructure to support over 7 million queries/second across more than one-hundred clusters.

In this talk, I will be speaking through:

Why this team needed to exist
What we wanted to improve (e.g. tighter integration with existing deployment infrastructure)
How we integrated a third-party product into our deployment system
Things we wish we did differently after implementing our initial automation/tooling
Implementing seamless upgrades (compare it to how things were in the past)
Transitioning from running in root to non-root
Tooling we created to provision stores quickly
Where we want to take caching at LinkedIn
Things to consider about if your team provides a datastore as a service

James Won is a Staff Site Reliability Engineer at LinkedIn, responsible for keeping its caching infrastructure running smoothly and scalable. He not only spends time in the day-to-day operations of maintaining caching infrastructure but is also a huge fan of Python and thus automates as much as he possibly can to reduce human error and make tasks as self-service as possible.

Connect:

@jwon_me

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@conference {233319,
author = {James Won},
title = {Building Centralized Caching Infrastructure at Scale},
year = {2019},
address = {Singapore},
publisher = {USENIX Association},
month = jun
}

Download

View the slides

Building Centralized Caching Infrastructure at Scale

Open Access Media

Presentation Video