Scaling Distributed File Systems in Resource-Harvesting Datacenters

Authors: 

Pulkit A. Misra, Duke University; Íñigo Goiri, Jason Kace and Ricardo Bianchini, Microsoft Research

Abstract: 

Datacenters can use distributed file systems to store data for batch processing on the same servers that run latency-critical services. Taking advantage of this storage capacity involves minimizing interference with the co-located services, while implementing user-friendly, efficient, and scalable file system access. Unfortunately, current systems fail one or more of these requirements, and must be manually partitioned across independent subclusters. Thus, in this paper, we introduce techniques for automatically and transparently scaling such file systems to entire resource-harvesting datacenters. We create a layer of software in front of the existing metadata managers, assign servers to subclusters to minimize interference and data movement, and smartly migrate data across subclusters in the background. We implement our techniques in HDFS, and evaluate them using simulation of 10 production datacenters and a real 4k-server deployment. Our results show that our techniques produce high file access performance, and high data durability and availability, while migrating a limited amount of data. We recently deployed our system onto 30k servers in Bing’s datacenters, and discuss lessons from this deployment.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {203225,
author = {Pulkit A. Misra and Inigo Goiri and Jason Kace and Ricardo Bianchini},
title = {Scaling Distributed File Systems in {Resource-Harvesting} Datacenters},
booktitle = {2017 USENIX Annual Technical Conference (USENIX ATC 17)},
year = {2017},
isbn = {978-1-931971-38-6},
address = {Santa Clara, CA},
pages = {799--811},
url = {https://www.usenix.org/conference/atc17/technical-sessions/presentation/misra},
publisher = {USENIX Association},
month = jul
}

Presentation Audio