Practical Kubernetes Security Learning using Kubernetes Goat

Madhu Akula; Vickie Li; Travis Cotton

All the times listed below are in Pacific Daylight Time (PDT).

Attendee Files

LISA21 Attendee List (PDF)

Tuesday, June 1

7:45 am–8:00 am (PDT)

Opening Remarks

Program Co-Chairs: Carolyn Rowland, National Institute of Standards and Technology (NIST), and Avleen Vig, Facebook

8:00 am–9:30 am (PDT)

Keynote Addresses

Beyond Firefighter vs. Safety Matches: Growing the DevSecOps Pipeline

Tuesday, 8:00 am–8:45 am

Amélie Erin Koran, Splunk, Inc.

Available Media

Developing a pure security talent pipeline suffers often from overspecialization, looking for unicorns to fight the fires du jour in your organization. Conversely, DevSecOps may suffer from potentially overgeneralizing the skills required to be a purposeful and successful practitioner, and overload tasks and responsibilities on individuals, rather than the concept of shared responsibility. This talk will cover trying to learn from both problems to develop solutions and techniques to successfully navigate modern systems development, operations and security.

Amélie Erin Koran, Splunk, Inc.

Amélie is a Senior Technology Advocate at Splunk, who is focused on helping organizations transform, grow and secure themselves in the ever-evolving world of technologies and their accompanying challenges. She arrives at Splunk after nearly 25 years as a technologist, from systems administration and engineering to executive technology leadership in various industries, academia, NGOs, and the government. In the last decade, she's supported various Federal agencies, leading various projects and initiatives, including modernization activities, cybersecurity policy, and security architecture and operations. Often seen "soapboxing" about technology workforce development, training and recruiting policies, practices and techniques. She's a serial volunteer who tries to return the help she's received in her own career through mentorship, conversation, and community building.

Connect:

@webjedi

Lessons Learned from a Ransomware Attack

Tuesday, 8:45 am–9:30 am

Ski Kacoroski, Northshore School District

Available Media

This talk covers a ransomware attack on medium size school district (23K students, 4K staff). We start with the timeline of the attack that was determined by forensic analysis, cover what was damaged in the attack, and then cover the attack recovery process. Then we'll discuss changes that were made to avoid and mitigate any future attacks. We wrap up with the lessons learned during this attack in the hope that they will help you to avoid and recover quicker if you do experience a ransomware attack.

After this talk you will have a better understanding of how an attack happens, what kind of alerts may be symptoms of an attack, what to do in case you are attacked, what happens after you are attacked, and what actions you can take now to avoid and mitigate a ransomware attack.

Ski Kacoroski, Northshore School District

Ski has been done stints as a system admin, manager, consultant, professor, and emergency worker since 1982. He currently is a system admin at the Northshore School District in WA and an adjunct professor at Bellevue College. When not busy at a computer, Ski enjoys traveling, gardening, kayaking, and backpacking in the wonderful Pacific Northwest.

9:30 am–9:45 am (PDT)

Break

9:45 am–11:15 am (PDT)

Track I

"Disorganizing" Your SRE Organization

Tuesday, 9:45 am–10:30 am

Leonid Belkind, StackPulse

Available Media

More than a year ago, COVID-19 has presented us with a challenge on how to establish and grow our SRE practice under new conditions of working from home (while doubling and then tripling our team). Instead of trying to treat the situation in a "business as usual" manner, we embarked on a journey to reorganize our SRE practices, tools, and culture. In this talk, we will share the assumptions we started the process with, our learnings during the process, and the state we ended in, after more than a year of a change.

Leonid Belkind, CTO - StackPulse

Leonid Belkind is a Co-Founder and CTO at StackPulse, a Site Reliability Engineering orchestration platform. Prior to StackPulse, Leonid co-founded (and was CTO of) Luminate where he guided this enterprise-grade service from inception to widespread Fortune 500 adoption to acquisition by Symantec. Before Luminate, Leonid managed software development organizations at CheckPoint.

Through his career, Leonid has witnessed modern Software Engineering practices come and replace the traditional ones, first around Continuous Integration and Delivery pipelines, then Infrastructure Management and Monitoring, and onwards as software services have replaced on-premise products. Throughout this journey Leonid has become passionate about building reliability-first architectures, methodologies, and organizational culture.

Connect:

@LBelkind

Everything We Did Wrong to Do Accessibility Right at BuzzFeed

Tuesday, 10:30 am–11:15 am

Plum Ertz, Ro, and Jack Reid, BuzzFeed

Available Media

Over the past two years, BuzzFeed's main hub of "listicles" and quizzes has been picked apart by third-party auditors and mindfully reconstructed to meet WCAG 2.1 AA accessibility success criteria by its tech team. From day one, the remediation and certification process was a case study in building accessibility from solid moral grounds and shaky organizational ones. Our team is proud of our work, but we did quite a bit wrong to get it right. In this talk, we'll share what we've learned from our mistakes so that you can avoid the same ones on your own compliance journey, and set yourself up for long-term success in building accessible culture.

Plum Ertz, Ro

Plum Ertz (she/her) is a software engineer, baker of noms, and casual economist. Her main areas of expertise are on the front end, with a focus on accessibility and web standards. Plum is currently tackling accessibility and product growth in the patient experience as a Senior Engineering Manager at Ro.

Connect:

@plumertzi

Jack Reid, BuzzFeed

Jack Reid (he/him) is a software engineer living in South London. He spends most of his time worrying about accessibility or playing with his cat. He's currently Engineering Manager at BuzzFeed.

Connect:

@jackreid

Track II

Exploiting Brain Vulnerabilities: Chaotic Good at Scale

Tuesday, 9:45 am–10:30 am

Maia Sauren

Available Media

Have you ever tried convincing your high school friends that they should use a password manager?

This talk is about how to change people's minds and successfully introduce new concepts to groups and individuals without damaging relationships.

Each of us travels across cultures—work, home, multiple friendship groups. We even travel across micro-cultures—adjacent work teams, different arms of the family. We navigate groups with distinctive communication norms, consent norms, political awareness, technical knowledge, and assumptions of morality—and we do it several times a day.

How do we do that successfully? More importantly, how do we bring new concepts to existing groups without damaging relationships? We do it by accepting that normal is different everywhere, and it doesn't have to remain the same over time.

We will cover change management theory and practice, organisational transformation techniques, complexity communication, and theories of mind. The audience will leave with concrete tools for enabling organisational and group behaviour change, and adaptive communication strategies for translating contexts and concepts across domains.

Maia Sauren[node:field-speakers-institution]

Dr. Maia Sauren is a program and project manager, strategist, delivery team lead, and analyst, specialising in healthcare and data-intensive domains. Maia's background includes a biomedical engineering Ph.D. and a variety of international roles as technical educator, science communicator, and data scientist, with work spanning data science and architecture strategy, from large-scale organisational transformation to healthcare analytics applications for specialised and low-resource settings.

Connect:

@sauramaia

Re-imagining Management Methods for Distributed and Clustered Systems with Kraken/Layercake

Tuesday, 10:30 am–11:15 am

J. Lowell Wofford, Kevin Pelzel, and Travis Cotton, Los Alamos National Laboratory

Available Media

The overarching design of cluster system management stacks has not changed in decades. Most existing tooling works the same: set up netboot, configure some system "images," power on, and hope for the best. This set-it-and-leave-it approach is inadequate as systems grow in size and complexity. Modern systems need robust ways to automate systems management and enforce system states over time.

We have been rethinking the tooling for clustered systems. We introduce a new framework for distributed system automation, "Kraken," as well as a Kraken-based provisioning toolkit, "Layercake." Together they provide distributed, stateful provisioning and automation across clustered systems. Immediate advantages include: scalably and reliably initializing clusters from bare metal; self-healing capabilities for (some) failures; continuous system state enforcement; automated changes to configurations, personalities, and node images (often in microseconds); all while being declarative, idempotent, modular & extensible. We will present both the Kraken/Layercake tooling and outline the core design principles.

J. Lowell Wofford, Los Alamos National Laboratory

J. Lowell Wofford is a scientist at Los Alamos National Laboratory in the HPC Design group. Over the past couple of decades, he has dabbled in many aspects of High-Performance Computing, from scientific algorithms to system design. Lowell's current work is on Cluster and Supercomputer design, including system hardware, high-speed networks, and system software architecture. Most recently, he has focused on novel ways to automate the management of very large distributed systems.

Kevin Pelzel, Los Alamos National Laboratory

Kevin is a scientist at Los Alamos National Laboratory. He graduated from the University of Wisconsin Stout in 2018 with a Bachelors in Computer and Electrical Engineering and immediately started work at LANL's HPC division, first as a post-bach, then as a staff scientist. Since then he's been working in the HPC environments group focused on developing tools for system management, such as automated system bring up and maintenance, syslog analysis, and data transfer utilities.

Travis Cotton, Los Alamos National Laboratory

Travis is a scientist at Los Alamos National Laboratory in the HPC division. He graduated from New Mexico State University in 2013 with a Masters in Computer Science and has been working in HPC in various roles, starting as a research assistant in his Master's program and throughout his career. He started working at LANL as a scientist in 2018 in the HPC systems group, where he focuses on production computing, configuration management, and cluster image building.

11:15 am–12:00 pm (PDT)

Break

12:00 pm–1:40 pm (PDT)

Track I

Kind Engineering: How to Engineer Kindness

Tuesday, 12:00 pm–12:45 pm

Evan Smith, Solvemate

Available Media

Software Engineering is an environment that is rife with worrying stereotypes like the "Brilliant Jerk" and the "Peter Principle." How do you be a kind engineer and encourage kindness in engineering? For the last couple years, Evan has been exploring what it means to be a Kind Engineer through books, media and interviews. It's called different things in different circles but ultimately it leads back to one fact: people who give more unconditionally make themselves happier, their teams happier and their companies happier.

This talk will cover practical tips for becoming a kinder engineer through the following topics: code reviews, psychological safety, giving/receiving feedback and honesty in the workplace.

Evan Smith, Solvemate

Evan Smith is a Site Reliability Engineer with the remote German company Solvemate and is responsible for managing the infrastructure, CI/CD, incident response and monitoring, as well as promoting a culture of kindness and learning.

Connect:

@TheJokersThief

More Performant Cluster State Management Using Open Source Firmware and a Kraken

Tuesday, 12:45 pm–1:30 pm

Devon Bautista and J. Lowell Wofford, Los Alamos National Laboratory

Available Media

Often, vendor-provided firmware is proprietary and closed, which can present some hurdles in high-performance computing (HPC). Vendor firmware usually provides a generic way for bootstrapping systems, having to accommodate for many situations, but purpose-built clusters would benefit from more purpose-built firmware. The ability to customize the system initialization more granularly would provide more control over the hardware. This could potentially increase boot efficiency and reduce boot times by eliminating unused features and introducing more useful ones, but proprietary firmware tends to limit the amount of fine tuning that is possible. This talk will demonstrate a use case for open firmware in the context of HPC with the integration of Kraken, a distributed state management tool focused on managing stateless HPC clusters. It will demonstrate how open firmware can be leveraged for eliminating nonnecessities in the boot process of nodes, as well as for provisioning them more reliably.

Devon Bautista, Los Alamos National Laboratory

Devon is a post-masters student at Los Alamos National Laboratory working under the New Mexico Consortium. He completed his Bachelor of Science in Computer Systems Engineering in 2019 and Master of Science in Computer Engineering in 2020, both at Arizona State University, and started working at LANL as a summer intern in 2019. He currently works in LANL's HPC design group, focusing on system initialization, management, and provisioning from a low-level perspective.

J. Lowell Wofford, Los Alamos National Laboratory

J. Lowell Wofford is a scientist at Los Alamos National Laboratory in the HPC Design group. Over the past couple of decades, he has dabbled in many aspects of High-Performance Computing, from scientific algorithms to system design. Lowell's current work is on Cluster and Supercomputer design, including system hardware, high-speed networks, and system software architecture. Most recently, he has focused on novel ways to automate the management of very large distributed systems.

Track II (Core Principles)

5 Years of Cgroup v2: The Future of Linux Resource Control

Tuesday, 12:00 pm–12:45 pm

Chris Down, Facebook

Core Principles

Available Media

Control groups (or cgroups for short) are one of the most fundamental technologies underpinning our modern love of containerisation and resource control. Back in 2016, we released a complete overhaul of how cgroups work internally: cgroup v2, released with Linux 4.5. This brought many new and exciting possibilities to increase system stability and throughput, but with those possibilities have also come challenges of a type which we have largely not faced in Linux before.

This talk will go into some of the challenges faced in overhauling Linux's resource isolation and control capabilities, and how we've gone about fixing them. This will include some of the most complex and counter-intuitive practical effects we've seen in production, with details of how our expectations and knowledge have developed over the last 5 years using this on over a million machines in production, with insights that are immediately applicable to anyone who runs Linux at scale.

We will also go over the state-of-the-art of resource control in the "real world" outside of companies like Facebook and Google, looking at how cgroup v2 is changing the technical landscape for distributions and containerisation technologies for the better.

Chris Down, Facebook

Chris Down is an engineer on the Facebook kernel team, primarily working on cgroups and overall memory management strategy. He is responsible for debugging and resolving major production issues and improving the reliability and efficiency of Facebook's systems, and is also a maintainer of systemd.

Connect:

@unixchris

BPF Internals

Tuesday, 12:45 pm–1:30 pm

Brendan Gregg

Core Principles

Available Media

Extended BPF (aka eBPF) is a new type of software for secure, performant, event-driven programs, and has seen widespread adoption. Your Linux servers may already be running BPF programs; Netflix cloud instances run 15 by default, and Facebook over 40. These programs are for networking, performance tools, security policies, device drivers, application proxies, and more. Many have said that BPF is taking over Linux.

This talk is a deep dive that describes how BPF works internally and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.

Brendan Gregg[node:field-speakers-institution]

Brendan Gregg is an industry expert in computing performance and cloud computing. He is a senior performance architect at Netflix, where he does performance design, evaluation, analysis, and tuning. He is the author of Systems Performance and BPF Performance Tools (Addison-Wesley), and received the USENIX LISA Award for Outstanding Achievement in System Administration. Brendan has created numerous performance analysis tools, visualizations, and methodologies for performance analysis, including flame graphs.

Connect:

@brendangregg

1:40 pm–1:55 pm (PDT)

Break

1:55 pm–2:55 pm (PDT)

Lightning Talks

Below: Interactive Resource Monitor for Modern Linux Systems

Tuesday, 1:55 pm–2:10 pm

Daniel Xu, Facebook

Available Media

Stop Writing Your Own Infrastructure

Tuesday, 2:10 pm–2:25 pm

Ben Cotton

Available Media

Let's Go to the Colonies! OS Upgrades—the When, the Why, and the How

Tuesday, 2:25 pm–2:40 pm

Caskey Dickson

Available Media

Adding Metrics Support to OpenTelemetry

Tuesday, 2:40 pm–2:55 pm

Alolita Sharma, AWS

Available Media

4:00 pm–5:00 pm (PDT)

LISA21 Sponsor Events

View the list of LISA21 Sponsor Events.