The Truth About MapReduce Performance on SSDs
Karthik Kambatla, Cloudera Inc. and Purdue University; Yanpei Chen, Cloudera Inc.
Solid-state drives (SSDs) are increasingly being considered as a viable alternative to rotational hard-disk drives (HDDs). In this paper, we investigate if SSDs improve the performance of MapReduce workloads and evaluate the economics of using PCIe SSDs either in place of or in addition to HDDs. Our contributions are (1) a method of benchmarking MapReduce performance on SSDs and HDDs under constant-bandwidth constraints, (2) identifying cost-per-performance as a more pertinent metric than cost-per-capacity when evaluating SSDs versus HDDs for performance, and (3) quantifying that SSDs can achieve up to 70% higher performance for 2.5x higher cost-per-performance.
Yanpei Chen, Cloudera Inc.
Yanpei Chen is a member of the Performance Engineering Team at Cloudera, where he works on internal and competitive performance measurement and optimization. His work touches upon multiple interconnected computation frameworks, including Cloudera Search, Cloudera Impala, Apache Hadoop, Apache HBase, and Apache Hive. He is the lead author of the Statistical Workload Injector for MapReduce (SWIM), an open source tool that allows someone to synthesize and replay MapReduce production workloads. SWIM has become a standard MapReduce performance measurement tool used to certify many Cloudera partners. He received his doctorate at the UC Berkeley AMP Lab, where he worked on performance-driven, large-scale system design and evaluation.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Karthik Kambatla and Yanpei Chen},
title = {The Truth About {MapReduce} Performance on {SSDs}},
booktitle = {28th Large Installation System Administration Conference (LISA14)},
year = {2014},
isbn = {978-1-931971-17-1},
address = {Seattle, WA},
pages = {118--126},
url = {https://www.usenix.org/conference/lisa14/conference-program/presentation/kambatla},
publisher = {USENIX Association},
month = nov
}
connect with us