Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Program
  • Participate
    • Call for Participation
  • About
  • Home
  • Program
  • Participate
  • About

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner

general information

Early Bird Registration Deadline: March 16, 2016

SREcon16 is SOLD OUT.
No walkup registrations will be accepted.

Venue:
Hyatt Regency Santa Clara
5101 Great America Pkwy
Santa Clara, CA 95054

Rooms at the Hyatt Regency Santa Clara are sold out.

Rooms available at:
Biltmore Hotel & Suites
2151 Laurelwood Road
Santa Clara, CA 95054

Book your room for $225 single or double plus tax or call (800) 255-9925 or (408) 988-8411 and reference USENIX Association or Billing ID #32992. Room rate includes WiFi and complimentary shuttle to the Hyatt Regency Santa Clara.

Questions?
About SREcon?
About the Call for Participation?
About the Hotel/Registration?
About Sponsorship?

help promote

SREcon16 button

twitter

Tweets by @SREcon

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Operational Buddhism: Building Reliable Services from Unreliable Components
Tweet

connect with us

Operational Buddhism: Building Reliable Services from Unreliable Components

Ernie Souhrada, Pinterest

Abstract: 

The rise of utility computing has revolutionized much about the way organizations think about infrastructure and back-end serving systems compared to the ""olden days"" of dedicated physical data centers. However, in the final analysis, success is still driven by meeting your SLAs. If services are up and sufficiently performant, you win. If not, you lose.

In the traditional data center environment, fighting the uptime battle was typically driven by a philosophy I call ""Operational Materialism."" The primary goal of OM is preventing failures at the infrastructure layer, and mechanisms for making this happen are plentiful and well-understood, many of which boil down to simply spending enough money to have at least N+1 of anything that might fail and create significant downtime as a result. Redundant power supplies, NIC bonding, replicated SANs, and hot-standby servers are some of the common artifacts of an OM world.

In the cloud, however, Operational Materialism cannot succeed. Although the typical cloud provider tends to be holistically reliable, there are no guarantees that any individual virtual instance will not randomly or intermittently drop off the network or be terminated outright. Yet we still need to keep our services up and running and meet our SLAs, and thus we need a different mindset that accounts for the fundamentally opaque and ephemeral nature of the public cloud.

In this talk, I will present an alternative to OM, a worldview that I refer to as "Operational Buddhism." Like traditional Buddhism, OB has Four Noble Truths:

    1. Cloud-based servers can fail at any time for any reason.
    2. Trying to prevent this server failure is an endless source of suffering for DBAs and SREs alike.
    3. Accepting the impermanence of individual servers, we can focus on designing systems that are failure-resilient, rather than failure-resistant.
    4. We can escape the cycle of suffering and create a better experience for our customers, users, and colleagues.

To illustrate these concepts with concrete examples, I will discuss how configuration management, automation, and service discovery help us to practice Operational Buddhism at Pinterest for both stateful (MySQL, HBase) and stateless (web) services. Moreover, as our path is not the only road to infrastructure enlightenment, I'll also talk about some of the roads not taken, including the debate over Infrastructure-as-a-Service (IaaS) vs. Platform-as-a-Service (PaaS).

Ernie Souhrada is a database engineer on the SRE team at Pinterest where his current focus is on improving the performance and operational efficiency of a petabyte-scale hybrid deployment of MySQL, HBase, and Redis. Over the past two decades, Ernie has worked in almost every aspect of information technology, from network engineering and software development to systems administration and information security. When not slinging data or thinking about infrastructure, he can be found on a ski slope, at a sushi bar, or in search of the next great psytrance track. Ernie holds a B.S. in mathematics and a B.A. in political science from Arizona State University.

Ernie Souhrada, Pinterest

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {208613,
author = {Ernie Souhrada},
title = {Operational Buddhism: Building Reliable Services from Unreliable Components},
year = {2016},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = apr,
}
Download
View the slides

Presentation Video 

Presentation Audio

MP3 Download

Download Audio

  • Log in or    Register to post comments

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Media Sponsors & Industry Partners

© USENIX

SREcon is a registered trademark of the USENIX Association.

  • Privacy Policy
  • Contact Us