Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration
    • Discounts
    • Venue, Hotel, and Travel
    • Why Attend?
    • Students and Grants
    • Speaker Resources
  • Program
    • Program at a Glance
    • Conference Program
    • Training Program
    • Workshop Program
    • Conference Topics
    • Co-Located Events
      • URES '14 West
      • SESA '14
      • Puppet Camp Seattle
      • LISA Data Storage Day
      • CentOS Dojo Seattle
    • Activities
      • LISA Build
      • LISA Lab
      • Birds-of-a-Feather Sessions
      • Poster Session
      • LISA14 Expo
  • Sponsors and Expo
    • LISA14 Expo
    • Sponsors/Exhibitors List
    • Exhibitor Services
    • Download Prospectus (PDF)
  • About
    • Conference Organizers
    • Past Conferences
    • Services
    • Contact Us
    • Code of Conduct
    • Original Call for Participation
    • Help Promote

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

why attend lisa?

"Information from LISA helps us push the envelope on automation and scaling, allowing a team of four to manage over 3000 Firefox build and test systems running 15 different operating systems."

Amy Rich, Manager of Release Engineering Operations at Mozilla

"LISA is where I find direction for evolving the my core professional skills."

LISA '13 Attendee

"I use LISA to benchmark the SA activities in my company."

LISA '13 Attendee

"LISA is where professionals share what's hot in designing, building, and maintaining critical systems."

Tom Limoncelli, author, speaker, and system administrator

"LISA is the conference that I send my system administrators to so they can bring the latest tools and techniques back to the rest of the team. Much of our current environment can be traced directly back to LISA."

Cory Lueninghoener, Deputy Group Leader of Production High Performance Computing at Los Alamos National Laboratory

"LISA is the conference that I send my system administrators to so they can bring the latest tools and techniques back to the rest of the team. Much of our current environment can be traced directly back to LISA."

Cory Lueninghoener, Deputy Group Leader of Production High Performance Computing at Los Alamos National Laboratory

"I keep coming back for the technical content and the personal networking opportunities. I attend for career development."

LISA '13 Attendee

"LISA is the place where industry best practices and cutting-edge research come together to advance system administration."

Nicole Forsgren Velasquez, Utah State University

"LISA is where professionals share what's hot in designing, building, and maintaining critical systems."

Tom Limoncelli, author, speaker, and system administrator

help promote

LISA16 CFP button

Get more
Help Promote graphics!

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
General Sponsor
General Sponsor
General Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner
Industry Partner
Industry Partner
Industry Partner
Industry Partner

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

Site Map

You are here

Home ยป Five Pitfalls for Benchmarking Big Data Systems
Tweet

connect with us

http://twitter.com/lisaconference
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

Five Pitfalls for Benchmarking Big Data Systems

Invited Talk
Wednesday, November 12, 2014 - 4:45pm-5:30pm

Yanpei Chen and Gwen Shapira, Cloudera, Inc.

Abstract: 

Performance is an increasingly important attribute of Big Data systems as focus shifts from batch processing to real-time analysis and to consolidated multi-tenant systems. One of the little-understood challenges in scaling data systems is properly defining and measuring performance. The complexity, diversity, and scale of big data systems make this a difficult task and we frequently encounter haphazard benchmarks that lead to bad technology choices, poor purchasing decisions, and suboptimal cluster operations. This talk draws on performance engineering and field services experience from a leading Big Data vendor. We will talk about the most common performance benchmarking pitfalls and share practical advice on how to avoid them with rigorous metrics and measurement methods.

Yanpei Chen, Cloudera Inc.

Yanpei Chen is a member of the Performance Engineering Team at Cloudera, where he works on internal and competitive performance measurement and optimization. His work touches upon multiple interconnected computation frameworks, including Cloudera Search, Cloudera Impala, Apache Hadoop, Apache HBase, and Apache Hive. He is the lead author of the Statistical Workload Injector for MapReduce (SWIM), an open source tool that allows someone to synthesize and replay MapReduce production workloads. SWIM has become a standard MapReduce performance measurement tool used to certify many Cloudera partners. He received his doctorate at the UC Berkeley AMP Lab, where he worked on performance-driven, large-scale system design and evaluation.

Gwen Shapira, Cloudera Inc.

Gwen Shapira is a Solutions Architect at Cloudera. She has 15 years of experience working with customers to design scalable data architectures. Working as a data warehouse DBA, ETL developer and a senior consultant. She specializes in migrating data warehouses to Hadoop, integrating Hadoop with relational databases, building scalable data processing pipelines, and scaling complex data analysis algorithms.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Video 

Presentation Audio

MP3 Download

Download Audio

  • Log in or    Register to post comments

Back to Conference Program

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us

LISA is a registered trademark of the USENIX Association.