A Tale of Two Erasure Codes in {HDFS}

Mingyuan Xia; Mohit Saxena; Mario Blaum; David A. Pease

help promote

FAST '15 button

Get more
Help Promote graphics!

USENIX Conference Policies

A Tale of Two Erasure Codes in HDFS

Mingyuan Xia, McGill University; Mohit Saxena, Mario Blaum, and David A. Pease, IBM Research Almaden

Distributed storage systems are increasingly transitioning to the use of erasure codes since they offer higher reliability at significantly lower storage costs than data replication. However, these codes tradeoff recovery performance as they require multiple disk reads and network transfers for reconstructing an unavailable data block. As a result, most existing systems use an erasure code either optimized for storage overhead or recovery performance.

In this paper, we present HACFS, a new erasure-coded storage system that instead uses two different erasure codes and dynamically adapts to workload changes. It uses a fast code to optimize for recovery performance and a compact code to reduce the storage overhead. A novel conversion mechanism is used to efficiently upcode and downcode data blocks between fast and compact codes. We show that HACFS design techniques are generic and successfully apply it to two different code families: Product and LRC codes.

We have implemented HACFS as an extension to the Hadoop Distributed File System (HDFS) and experimentally evaluate it with five different workloads from production clusters. The HACFS system always maintains a low storage overhead and significantly improves the recovery performance as compared to three popular singlecode storage systems. It reduces the degraded read latency by up to 46%, and the reconstruction time and disk/network traffic by up to 45%.

Mingyuan Xia, McGill University

Mohit Saxena, IBM Research Almaden

Mario Blaum, IBM Research Almaden

David A. Pease, IBM Research Almaden

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {188446,
author = {Mingyuan Xia and Mohit Saxena and Mario Blaum and David A. Pease},
title = {A Tale of Two Erasure Codes in {HDFS}},
booktitle = {13th USENIX Conference on File and Storage Technologies (FAST 15)},
year = {2015},
isbn = {978-1-931971-201},
address = {Santa Clara, CA},
pages = {213--226},
url = {https://www.usenix.org/conference/fast15/technical-sessions/presentation/xia},
publisher = {USENIX Association},
month = feb
}

help promote

USENIX Conference Policies

A Tale of Two Erasure Codes in HDFS

Mingyuan Xia, McGill University

Mohit Saxena, IBM Research Almaden

Mario Blaum, IBM Research Almaden

David A. Pease, IBM Research Almaden

Open Access Media

Presentation Video

Presentation Audio

Gold Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

sponsors

help promote

USENIX Conference Policies

A Tale of Two Erasure Codes in HDFS

Mingyuan Xia, McGill University

Mohit Saxena, IBM Research Almaden

Mario Blaum, IBM Research Almaden

David A. Pease, IBM Research Almaden

Open Access Media

Presentation Video

Presentation Audio

Gold Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners