Managing Capacity @ LinkedIn

Wednesday, May 24, 2017 - 9:55am10:50am

Anuprita Harkare, LinkedIn

Abstract: 

Have you ever struggled with planning and managing your data ingestion platform? And spends countless nights figuring out whether you have enough capacity or over provisioned? Is cost to serve a concern to you? If yes, you are not alone.

Linkedin as a platform serves its contents to millions of unique users. This generates huge volume of data from members profiles, connections, posts and other activities on the platform. These voluminous and fast moving datasets needs to be effortlessly ingested from different data sources and should be made available for analysis with low latency and same level of data quality. In this talk you will learn how we are tackling this at LinkedIn

Anuprita Harkare, LinkedIn

Anuprita works for LinkedIn in the Data Systems team as a Site Reliability Engineer. In her current role, she is responsible for LinkedIn’s ETL infrastructure and takes care of critical data pipelines. Her day job consist of automation and building tools using python and java, apart from that she actively works on Hadoop, Hive, Pig, Gobblin, and a multitude of other big data technologies.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {202785,
author = {Anuprita Harkare},
title = {Managing Capacity @ LinkedIn},
year = {2017},
publisher = {{USENIX} Association},
month = may,
}

Presentation Video 

Presentation Audio