Serving Large-scale Batch Computed Data with Project Voldemort

Serving Large-scale Batch Computed Data with Project Voldemort

Roshan Sumbaly, Jay Kreps, Lei Gao, Alex Feinberg, Chinmay Soman, and Sam Shah, LinkedIn Corp.

Current serving systems lack the ability to bulk load massive immutable data sets without affecting serving performance. The performance degradation is largely due to index creation and modification as CPU and memory resources are shared with request serving. We have extended Project Voldemort, a general-purpose distributed storage and serving system inspired by Amazon's Dynamo, to support bulk loading terabytes of read-only data. This extension constructs the index offline, by leveraging the fault tolerance and parallelism of Hadoop. Compared to MySQL, our compact storage format and data deployment pipeline scales to twice the request throughput while maintaining sub 5 ms median latency. At LinkedIn, the largest professional social network, this system has been running in production for more than 2 years and serves many of the data-intensive social features on the site.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {266262,
title = {Serving Large-scale Batch Computed Data with Project Voldemort},
booktitle = {10th USENIX Conference on File and Storage Technologies (FAST 12)},
year = {2012},
address = {San Jose, CA},
url = {https://www.usenix.org/conference/fast12/serving-large-scale-batch-computed-data-project-voldemort},
publisher = {USENIX Association},
month = feb
}

USENIX Conference Policies

Serving Large-scale Batch Computed Data with Project Voldemort

Open Access Media

Presentation Video

Presentation Audio