Building Centralized Caching Infrastructure at Scale

Friday, June 14, 2019 - 11:00 am12:00 pm

James Won, LinkedIn

Abstract: 

Caching is integral to any large-scale web operation. LinkedIn formed a dedicated caching team in 2017 and since then we have built out automation and infrastructure to support over 7 million queries/second across more than one-hundred clusters.

In this talk, I will be speaking through:

  • Why this team needed to exist
  • What we wanted to improve (e.g. tighter integration with existing deployment infrastructure)
  • How we integrated a third-party product into our deployment system
  • Things we wish we did differently after implementing our initial automation/tooling
  • Implementing seamless upgrades (compare it to how things were in the past)
  • Transitioning from running in root to non-root
  • Tooling we created to provision stores quickly
  • Where we want to take caching at LinkedIn
  • Things to consider about if your team provides a datastore as a service

James Won, LinkedIn

James Won is a Staff Site Reliability Engineer at LinkedIn, responsible for keeping its caching infrastructure running smoothly and scalable. He not only spends time in the day-to-day operations of maintaining caching infrastructure but is also a huge fan of Python and thus automates as much as he possibly can to reduce human error and make tasks as self-service as possible.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {233319,
author = {James Won},
title = {Building Centralized Caching Infrastructure at Scale},
year = {2019},
address = {Singapore},
publisher = {USENIX Association},
month = jun
}

Presentation Video