sponsors
general information
Early Bird Registration Deadline: March 16, 2016
SREcon16 is SOLD OUT.
No walkup registrations will be accepted.
Venue:
Hyatt Regency Santa Clara
5101 Great America Pkwy
Santa Clara, CA 95054
Rooms at the Hyatt Regency Santa Clara are sold out.
Rooms available at:
Biltmore Hotel & Suites
2151 Laurelwood Road
Santa Clara, CA 95054
Book your room for $225 single or double plus tax or call (800) 255-9925 or (408) 988-8411 and reference USENIX Association or Billing ID #32992. Room rate includes WiFi and complimentary shuttle to the Hyatt Regency Santa Clara.
Questions?
About SREcon?
About the Call for Participation?
About the Hotel/Registration?
About Sponsorship?
help promote
usenix conference policies
You are here
Operational Buddhism: Building Reliable Services from Unreliable Components
Ernie Souhrada, Pinterest
The rise of utility computing has revolutionized much about the way organizations think about infrastructure and back-end serving systems compared to the ""olden days"" of dedicated physical data centers. However, in the final analysis, success is still driven by meeting your SLAs. If services are up and sufficiently performant, you win. If not, you lose.
In the traditional data center environment, fighting the uptime battle was typically driven by a philosophy I call ""Operational Materialism."" The primary goal of OM is preventing failures at the infrastructure layer, and mechanisms for making this happen are plentiful and well-understood, many of which boil down to simply spending enough money to have at least N+1 of anything that might fail and create significant downtime as a result. Redundant power supplies, NIC bonding, replicated SANs, and hot-standby servers are some of the common artifacts of an OM world.
In the cloud, however, Operational Materialism cannot succeed. Although the typical cloud provider tends to be holistically reliable, there are no guarantees that any individual virtual instance will not randomly or intermittently drop off the network or be terminated outright. Yet we still need to keep our services up and running and meet our SLAs, and thus we need a different mindset that accounts for the fundamentally opaque and ephemeral nature of the public cloud.
In this talk, I will present an alternative to OM, a worldview that I refer to as "Operational Buddhism." Like traditional Buddhism, OB has Four Noble Truths:
- 1. Cloud-based servers can fail at any time for any reason.
- 2. Trying to prevent this server failure is an endless source of suffering for DBAs and SREs alike.
- 3. Accepting the impermanence of individual servers, we can focus on designing systems that are failure-resilient, rather than failure-resistant.
- 4. We can escape the cycle of suffering and create a better experience for our customers, users, and colleagues.
To illustrate these concepts with concrete examples, I will discuss how configuration management, automation, and service discovery help us to practice Operational Buddhism at Pinterest for both stateful (MySQL, HBase) and stateless (web) services. Moreover, as our path is not the only road to infrastructure enlightenment, I'll also talk about some of the roads not taken, including the debate over Infrastructure-as-a-Service (IaaS) vs. Platform-as-a-Service (PaaS).
Ernie Souhrada is a database engineer on the SRE team at Pinterest where his current focus is on improving the performance and operational efficiency of a petabyte-scale hybrid deployment of MySQL, HBase, and Redis. Over the past two decades, Ernie has worked in almost every aspect of information technology, from network engineering and software development to systems administration and information security. When not slinging data or thinking about infrastructure, he can be found on a ski slope, at a sushi bar, or in search of the next great psytrance track. Ernie holds a B.S. in mathematics and a B.A. in political science from Arizona State University.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Ernie Souhrada},
title = {Operational Buddhism: Building Reliable Services from Unreliable Components},
year = {2016},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = apr
}
connect with us