Size: 125x125 pixels
Copy Embed Code:
Friday, May 30, 2014
9:00 a.m.–5:00 p.m.
Thank you for your interest in SREcon14. Due to reaching maximum capacity, registration is now closed. Registered attendees can pick up their badges beginning at 7:30 a.m. on May 30 in the Mezzanine at the Hyatt Regency Santa Clara.
The SREcon14 Program is
Hyatt Regency Santa Clara
5101 Great America Pkwy
Santa Clara, CA 95054
Book your room!
You are here
Thank you for joining us at SREcon14!
USENIX is excited to announce the launch of the inaugural SREcon14, which will bring together the site reliability engineering community to discuss the evolution and future of this technical realm. View the inaugural SREcon14 program.
There’s much more to running a large-scale architecture than just pure software development: in order to have a system that can run 24/7, reliability, performance, and efficiency must also be considered. When you look at every single piece of software that addresses these issues, many challenges are revealed, and more often than not, they go beyond the scope of traditional development teams. Site Reliability Engineering (or Production Engineering depending on the organization) has grown to address just these issues within large-scale systems.
Our goal for SREcon is to share information about the details and principles that drive us towards engineering excellence for the products, systems, and architectures that we maintain.
SRE professionals are invited to to join us and exchange information about the strategies you used to deal with a certain type of challenge; an awesome bug hunting experience you had where the complexity and scale of the work environment made the issue not only large, but extremely interesting and worth sharing; and the types of issues you face working on such a large scale. Lastly, we'll also discuss ideas or concepts that are always in the back of your mind that help make the job easier.
The SREcon program includes:
- Michael “Mikey” Dickerson's Closing Talk on "How Silicon Valley’s SREs saved Healthcare.gov"
- Panel discussions on Releasing at Scale, Disaster Preparedness, Load Shedding, and SRE Culture Fundamentals
- Invited Talks on design review, proactive monitoring, mobile, and cascading failures
- And much more!
The event will be split into talks and discussion sessions, with opportunity for participants to bring their own subjects and challenges for discussion with like-minded people. View the full program, and don't forget to submit your ideas for the open space discussion area.
Sabrina Farmer, Senior Site Reliability Manager—Google Apps
Andrew Fong, SRE Lead, Dropbox
Fernanda Weiden, Manager of Production Engineering, Facebook
Ben Treynor joined Google as Site Reliability Tsar in 2003. He is the founder of Google's Site Reliability team and grew it organically from an original core of 7 "production" engineers to its current ranks of >1200 software engineers. SRE is responsible for everything from Google's internal software infrastructure, to user services like Search, Gmail, Ads, and to the burgeoning Cloud Platform. Additionally, Ben is responsible for Google's worldwide internal and external network (since 2004), its data centers and hardware operations (since 2009), and is part of the Google Cloud Platform management team (since late 2013).
Prior to Google, Ben held engineering management roles at Seven Networks, E.piphany, and Versant Object Technology, and started his career as a software engineer at Oracle in 1986. Ben holds a BS and MS in Computer Science from Stanford, and an MBA from UC Berkeley's Haas School.