The Truth About MapReduce Performance on SSDs
LISA: Where systems engineering and operations professionals share real-world knowledge about designing, building, and maintaining the critical systems of our interconnected world.
The LISA conference has long served as the annual vendor-neutral meeting place for the wider system administration community. The LISA14 program recognized the overlap and differences between traditional and modern IT operations and engineering, and developed a highly-curated program around 5 key topics: Systems Engineering, Security, Culture, DevOps, and Monitoring/Metrics. The program included 22 half- and full-day training sessions; 10 workshops; and a conference program consisting of 50 invited talks, panels, refereed paper presentations, and mini-tutorials.
Karthik Kambatla, Cloudera Inc. and Purdue University; Yanpei Chen, Cloudera Inc.
Solid-state drives (SSDs) are increasingly being considered as a viable alternative to rotational hard-disk drives (HDDs). In this paper, we investigate if SSDs improve the performance of MapReduce workloads and evaluate the economics of using PCIe SSDs either in place of or in addition to HDDs. Our contributions are (1) a method of benchmarking MapReduce performance on SSDs and HDDs under constant-bandwidth constraints, (2) identifying cost-per-performance as a more pertinent metric than cost-per-capacity when evaluating SSDs versus HDDs for performance, and (3) quantifying that SSDs can achieve up to 70% higher performance for 2.5x higher cost-per-performance.
Yanpei Chen, Cloudera Inc.

Yanpei Chen is a member of the Performance Engineering Team at Cloudera, where he works on internal and competitive performance measurement and optimization. His work touches upon multiple interconnected computation frameworks, including Cloudera Search, Cloudera Impala, Apache Hadoop, Apache HBase, and Apache Hive. He is the lead author of the Statistical Workload Injector for MapReduce (SWIM), an open source tool that allows someone to synthesize and replay MapReduce production workloads. SWIM has become a standard MapReduce performance measurement tool used to certify many Cloudera partners. He received his doctorate at the UC Berkeley AMP Lab, where he worked on performance-driven, large-scale system design and evaluation.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Karthik Kambatla and Yanpei Chen},
title = {The Truth About {MapReduce} Performance on {SSDs}},
booktitle = {28th Large Installation System Administration Conference (LISA14)},
year = {2014},
isbn = {978-1-931971-17-1},
address = {Seattle, WA},
pages = {118--126},
url = {https://www.usenix.org/conference/lisa14/conference-program/presentation/kambatla},
publisher = {USENIX Association},
month = nov
}
connect with us