Scaling HDFS with Consistent Reads from Standby Replicas

Monday, February 24, 2020 - 2:00 pm2:30 pm

Konstantin Shvachko, LinkedIn

Abstract: 

We introduce a novel technique of serving read requests from Standby replicas of a metadata services in active-standby architecture. The technique is implemented in Hadoop Distributed File System (HDFS). It substantially improves performance of the metadata service and overall scalability of the entire system. We introduce a strong consistency model and show how HDFS addresses both read-your-own-writes and third-party-communication consistency challenges. The talk will outline HDFS architecture, its scalability and performance constraints, describe the architecture of consistent reads from standby, and provide performance results based on real-life exponentially growing Hadoop cluster at LinkedIn.

Konstantin Shvachko, LinkedIn

Konstantin V. Shvachko is an expert in Big Data technologies, file systems, and storage solutions. He specializes in efficient data structures and algo­rithms for large-scale distributed storage systems. Konstantin is known as an open-source software developer, author, inventor, and entrepreneur. He is a senior staff software engineer at LinkedIn.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {246550,
author = {Konstantin Shvachko},
title = {Scaling {HDFS} with Consistent Reads from Standby Replicas},
year = {2020},
address = {Santa Clara, CA},
publisher = {{USENIX} Association},
month = feb,
}

Presentation Video