Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Overview
  • Conference Organizers
  • Registration Information
    • Registration Discounts
    • Hotel and Travel Information
    • Live Streaming
  • Purchase the Box Set
  • Why Attend LISA '13?
    • Watch the Video
  • Convince Your Boss
  • Program
    • At a Glance
    • Calendar
    • Training Program
    • Technical Sessions
    • Invited Speakers
    • Workshops
    • Conference Themes
  • Co-located Events
    • SESA '13
    • Gluster Community Day
    • Puppet Camp DC
    • Data Storage Day
    • Build a Cloud Day
  • Students and Grants
  • Sponsorship and Exhibition
    • Sponsors and Exhibitors
    • Vendor Exhibition
    • Exhibitor Services
    • Download Prospectus
  • Call for Participation
  • For Participants
    • Speaker Resources
  • Help Promote!
    • Flyer PDF
    • Brochure PDF
  • Activities
    • Birds-of-a-Feather Sessions
    • Poster Session
    • Lightning Talks Sign Up Form
    • LISA Lab Hack Space
  • Services
  • Questions
  • Past Conferences

sponsors

Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
General Sponsor
General Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner
Industry Partner
Industry Partner

twitter

Tweets by @LISAConference

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Making Problem Diagnosis Work for Large-Scale, Production Storage Systems
Tweet

connect with us

http://www.facebook.com/LISAConf
http://twitter.com/LISAConference
http://www.linkedin.com/groups?home=&gid=49559
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

Making Problem Diagnosis Work for Large-Scale, Production Storage Systems

Authors: 

Michael P. Kasick and Priya Narasimhan, Carnegie Mellon University; Kevin Harms, Argonne National Laboratory

Abstract: 

Intrepid has a very-large, production GPFS storage system consisting of 128 file servers, 32 storage controllers, 1152 disk arrays, and 11,520 total disks. In such a large system, performance problems are both inevitable and difficult to troubleshoot. We present our experiences, of taking an automated problem diagnosis approach from proof-of-concept on a 12-server test-bench parallel-filesystem cluster, and making it work on Intrepid’s storage system. We also present a 15-month case study, of problems observed from the analysis of 624GB of Intrepid’s instrumentation data, in which we diagnose a variety of performance-related storage-system problems, in a matter of hours, as compared to the days or longer with manual approaches.

Michael P. Kasick, Carnegie Mellon University

Priya Narasimhan, Carnegie Mellon University/YinzCam, Inc.

Kevin Harms, Argonne National Laboratory

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Kasick PDF

Presentation Video

Presentation Audio

MP3 Download OGG Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

© USENIX

LISA is a registered trademark of the USENIX Association.

  • Privacy Policy
  • Contact Us