Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Programme
  • Grants for Women
  • Participate

sponsors

Gold Sponsor
Gold Sponsor
[Amazon logo]
Gold Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
[Demonware logo]
General Sponsor

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

general information

Venue
DoubleTree by Hilton Dublin - Burlington Road
Leeson Street Upper
Dublin 4, Ireland

Questions?
About SREcon?
About Registration?
About Sponsorship?

twitter

Tweets by @SREcon

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Downtime Budgets
Tweet

connect with us

Downtime Budgets

Cory Lueninghoener, Los Alamos National Laboratory

Abstract: 

The concept of the error budget is a great way to hack SLAs and make them into a positive tool for system engineers. But how can you take the same idea from a world that handles millions of transactions in a day to one that handles hundreds? High Performance Computing jobs run for hours, days, or weeks at a time, resulting in unique challenges related to system availability, maintenance, and experimentation. This talk will explore a way to modify the error budget concept to fit in an HPC environment by applying the same idea to cluster outages, both planned and unplanned, and to ultimately give customers the best computing environment possible.

Cory Lueninghoener leads the HPC Design Group at Los Alamos National Laboratory. He has helped design, build, and manage some of the largest scientific computing resources in the world, including systems ranging in size from 100,000 to 900,000 processors. He is especially interested in turning large-scale system research into practice, and has worked on configuration management and system management tools in the past. Cory was co-chair of LISA 2015 and is active in the large scale system engineering community.

Cory Lueninghoener, Los Alamos National Laboratory

Cory Lueninghoener leads the HPC Design Group at Los Alamos National Laboratory. He has helped design, build, and manage some of the largest scientific computing resources in the world, including systems ranging in size from 100,000 to 900,000 processors. He is especially interested in turning large-scale system research into practice, and has worked on configuration management and system management tools in the past. Cory was co-chair of LISA 2015 and is active in the large scale system engineering community.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {208536,
author = {Cory Lueninghoener},
title = {Downtime Budgets},
year = {2016},
address = {Dublin},
publisher = {USENIX Association},
month = jul,
}
Download
View the slides

Presentation Video 

  • Log in or    Register to post comments

Gold Sponsors

[Amazon logo]

Silver Sponsors

Bronze Sponsors

[Demonware logo]

General Sponsors

© USENIX

SREcon is a registered trademark of the USENIX Association.

  • Privacy Policy
  • Contact Us