Skip to main content
Back to USENIX
  • Conferences
  • Students
Sign in
  • Overview
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-located Workshops
  • Program
  • Participate
    • Instructions for Participants
    • Call for Papers
  • Sponsorship
  • About
    • Summit Organizers
    • Services
    • Questions
    • Help Promote!
    • Past Summits
  • Home
  • Attend
  • Program
  • Activities
  • Sponsorship
  • Participate
  • About

sponsors

Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

help promote

HotCloud '16 button

USENIX Conference Policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

Towards Pre-Deployment Detection of Performance Failures in Cloud Distributed Systems

Riza O. Suminto, University of Chicago; Agung Laksono and Anang D. Satria, Surya University; Thanh Do, Microsoft Gray Systems Lab; Haryadi S. Gunawi, University of Chicago

Modern distributed systems ("cloud systems") have emerged as a dominant backbone for many today's applications. They come in different forms such as scale-out file systems, key-value stores, computing frameworks, synchronization and cluster management services. As these systems collectively become the "cloud operating system", users expect high dependability including performance stability. Unfortunately, the complexity of the software and environment in which they must run has outpaced existing testing and debugging tools. Cloud systems must run at scale with different topologies, execute complex distributed protocols, face load fluctuations and a wide range of hardware faults, and serve users with diverse job characteristics.

One type of important failures is performance failures, a situation where a system (e.g., Hadoop) does not deliver the expected performance (e.g., a job takes 10x longer time than usual). Conversation with cloud engineers reflects that performance stability is often more important than performance optimization; when performance failures happen, users are frustrated, systems waste and underutilize resources, and long debugging efforts are required to find and fix the problems. Sadly, performance failures are still common; our previous work shows that 22% of vital issues reported by cloud system developers relate to performance bugs.

In this paper, our focus is to answer the following three questions: What is the root-cause anatomy of performance bugs that appear in cloud systems? What is missing within the state of the art of detecting performance bugs? What are new novel directions that can prevent performance failures to happen in the field?

Riza O. Suminto, University of Chicago

Agung Laksono, Surya University

Anang D. Satria, Surya University

Thanh Do, Microsoft Gray Systems Lab

Haryadi S. Gunawi, University of Chicago

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {190601,
author = {Riza O. Suminto and Agung Laksono and Anang D. Satria and Thanh Do and Haryadi S. Gunawi},
title = {Towards {Pre-Deployment} Detection of Performance Failures in Cloud Distributed Systems},
booktitle = {7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15)},
year = {2015},
address = {Santa Clara, CA},
url = {https://www.usenix.org/conference/hotcloud15/workshop-program/presentation/suminto},
publisher = {USENIX Association},
month = jul
}
Download
Suminto PDF
View the slides
  • Log in or register to post comments

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX
EIN 13-3055038

  • Privacy Policy
  • Contact Us