Panel: Unsolved Problems in SRE

Note: Presentation times are in Coordinated Universal Time (UTC).

Thursday, 14 October, 2021 - 17:3018:15

Moderator: Kurt Andersen, Blameless

Panelists: Niall Murphy, RelyAbility; Narayan Desai, Google; Laura Nolan, Slack; Xiao Li, JP Morgan Chase; Sandhya Ramu, LinkedIn

Abstract: 

Every field of endeavor has its leading edge where the answers are unclear and active exploration is warranted. Although the phrase "here be dragons" might be an appropriate warning, this panel of intrepid adventurers will venture into that unknown territory.

Kurt Andersen, Blameless

Kurt Andersen is the head of strategy for Blameless.com. Prior to that, he was one of the leads for the Product-SRE organization at LinkedIn. Across the full spectrum of IT influence, he is strongly committed to developing the best engineers and teams, and enabling them with the right ideas, tools, and connections at the right time. Kurt has been active in the anti-abuse and IETF standards communities for over 20 years. He has spoken at multiple conferences on various aspects of reliability, authentication, and security and written for O'Reilly. He also serves on the USENIX Board of Directors and as liaison to the SREcon conferences worldwide.

Niall Murphy, RelyAbility

Niall Murphy has worked in Internet infrastructure since the mid-1990s, specializing in large online services. He has worked with all of the major cloud providers from their Dublin, Ireland offices, and most recently at Microsoft, where he was global head of Azure Site Reliability Engineering (SRE). He is the instigator, co-author, and editor of the two Google SRE books, and he is probably one of the few people in the world to hold degrees in Computer Science, Mathematics, and Poetry Studies. He lives in Dublin with his wife and two children.

Narayan Desai, Google

Narayan is an SRE at Google Cloud, where he is responsible for the reliability of GCP Data Analytics products.

Laura Nolan, Slack Technologies

Laura Nolan is a Senior Staff Engineer and tech lead at Slack, working mainly on service networking and ingress load balancing, as well as occasionally writing outage reports for the Slack Engineering blog. Laura has contributed to a number of books on SRE, including Site Reliability Engineering: How Google Runs Production Systems, Seeking SRE, and 97 Things Every SRE Should Know. She also regularly writes for USENIX's ;login: magazine, and is a member of the USENIX board and SREcon Steering Committee.

Sandhya Ramu, LinkedIn

Sandhya Ramu is Sr. Director of Engineering at LinkedIn who leads site reliability engineering team focused on big data & AI/ML platforms. She is a seasoned technology leader with close to 2 decades of web industry experience with building and leading cross functional teams. She is also passionate about the role of culture and of diversity and inclusion both at work and outside and actively participates in furthering this cause.

SREcon21 Open Access Sponsored by Indeed

BibTeX
@conference {276709,
author = {Kurt Andersen and Niall Murphy and Narayan Desai and Laura Nolan and Xiao Li and Sandhya Ramu},
title = {Panel: Unsolved Problems in {SRE}},
year = {2021},
publisher = {USENIX Association},
month = oct
}

Presentation Video