Skip to main content
USENIX
  • Conferences
  • Students
Sign in

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

help promote

SREcon16 button

general information

Registration Fee: $400
Register Now
Thanks to generous sponsorship, early bird pricing is now permanent for SREcon15!

Venue:
Hyatt Regency Santa Clara
5101 Great America Pkwy
Santa Clara, CA 95054

Questions?
About SREcon?
About the Call for Participation?
About the Hotel/Registration?
About Sponsorship?

twitter

Tweets by @SREcon

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Error Budgets and Risks
Tweet

connect with us

Error Budgets and Risks

Marc Alvidrez, Google

Abstract: 

Striving for Imperfection: Using an error budget to move fast without compromising high reliability

You may assume Site Reliability Engineers aim to build systems that never go down. What that fails to realize is that 100% reliability is almost never the goal. Instead, our task is to trade off reliability against the many other goals we have for our services. SREs want to provide great service to end users and customers, and also have the flexibility to change the systems often and quickly. We want to ensure that the queries and the revenue keep flowing, and do so as efficiently as possible, provisioning as little excess as necessary to deliver good service. Taking an engineering approach to meeting these goals means we need to make these tradeoffs measurable, and this is where error budgets come in. Your error budget is a measure of risk, it is the amount of headroom you have above your SLA. Being smart about how you manage and spend this error budget is one of the best tools that SRE has to meet the various contending goals that services at Internet scale present.

Marc Alvidrez is a Senior Staff Site Reliability Engineer with Google. He joined the company in 2004, and starting as an early SRE he has led a variety of teams responsible for both infrastructure and major user-facing services. These have included the first team responsible for Google File System (GFS), and the teams responsible for Google's Display and AdSense advertising serving systems, Google+ and Google Photos. Prior to Google he held systems engineering roles at Vodafone and Internet startup Topica, where he was the Director of Operations.

Marc Alvidrez, Google

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

View the slides

Presentation Video 

Presentation Audio

MP3 Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX

SREcon is a registered trademark of the USENIX Association.

  • Privacy Policy
  • Contact Us