Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration Information
    • Registration Discounts
    • Venue, Hotel, and Travel
    • Students and Grants
    • Co-located Events
      • USENIX ATC '15
      • HotStorage '15
  • Program
    • Workshop Program
  • Activities
    • Birds-of-a-Feather Sessions
  • Sponsorship
  • Participate
    • Call for Papers
    • Instructions for Participants
  • About
    • Workshop Organizers
    • Help Promote!
    • Questions
    • Past Workshops
  • Home
  • Attend
  • Program
  • Activities
  • Sponsorship
  • Participate
  • About

sponsors

Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

help promote

HotCloud '15 button

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

twitter

Tweets by @usenix

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home ยป Towards Pre-Deployment Detection of Performance Failures in Cloud Distributed Systems
Tweet

connect with us

Towards Pre-Deployment Detection of Performance Failures in Cloud Distributed Systems

Authors: 

Riza O. Suminto, University of Chicago; Agung Laksono and Anang D. Satria, Surya University; Thanh Do, Microsoft Gray Systems Lab; Haryadi S. Gunawi, University of Chicago

Abstract: 

Modern distributed systems ("cloud systems") have emerged as a dominant backbone for many today's applications. They come in different forms such as scale-out file systems, key-value stores, computing frameworks, synchronization and cluster management services. As these systems collectively become the "cloud operating system", users expect high dependability including performance stability. Unfortunately, the complexity of the software and environment in which they must run has outpaced existing testing and debugging tools. Cloud systems must run at scale with different topologies, execute complex distributed protocols, face load fluctuations and a wide range of hardware faults, and serve users with diverse job characteristics.

One type of important failures is performance failures, a situation where a system (e.g., Hadoop) does not deliver the expected performance (e.g., a job takes 10x longer time than usual). Conversation with cloud engineers reflects that performance stability is often more important than performance optimization; when performance failures happen, users are frustrated, systems waste and underutilize resources, and long debugging efforts are required to find and fix the problems. Sadly, performance failures are still common; our previous work shows that 22% of vital issues reported by cloud system developers relate to performance bugs.

In this paper, our focus is to answer the following three questions: What is the root-cause anatomy of performance bugs that appear in cloud systems? What is missing within the state of the art of detecting performance bugs? What are new novel directions that can prevent performance failures to happen in the field?

Riza O. Suminto, University of Chicago

Agung Laksono, Surya University

Anang D. Satria, Surya University

Thanh Do, Microsoft Gray Systems Lab

Haryadi S. Gunawi, University of Chicago

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Suminto PDF
View the slides
  • Log in or    Register to post comments

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us