LISA21 Conference Program

All the times listed below are in Pacific Daylight Time (PDT).

Tuesday, June 1

7:45 am–8:00 am (PDT)

Opening Remarks

Program Co-Chairs: Carolyn Rowland, National Institute of Standards and Technology (NIST), and Avleen Vig, Facebook

8:00 am–9:30 am (PDT)

Keynote Addresses

Beyond Firefighter vs. Safety Matches: Growing the DevSecOps Pipeline

Tuesday, 8:00 am8:45 am

Amélie Erin Koran, Splunk, Inc.

Developing a pure security talent pipeline suffers often from overspecialization, looking for unicorns to fight the fires du jour in your organization. Conversely, DevSecOps may suffer from potentially overgeneralizing the skills required to be a purposeful and successful practitioner, and overload tasks and responsibilities on individuals, rather than the concept of shared responsibility. This talk will cover trying to learn from both problems to develop solutions and techniques to successfully navigate modern systems development, operations and security.

Amélie Erin Koran, Splunk, Inc.

Amélie is a Senior Technology Advocate at Splunk, who is focused on helping organizations transform, grow and secure themselves in the ever-evolving world of technologies and their accompanying challenges. She arrives at Splunk after nearly 25 years as a technologist, from systems administration and engineering to executive technology leadership in various industries, academia, NGOs, and the government. In the last decade, she's supported various Federal agencies, leading various projects and initiatives, including modernization activities, cybersecurity policy, and security architecture and operations. Often seen "soapboxing" about technology workforce development, training and recruiting policies, practices and techniques. She's a serial volunteer who tries to return the help she's received in her own career through mentorship, conversation, and community building.

Lessons Learned from a Ransomware Attack

Tuesday, 8:45 am9:30 am

Ski Kacoroski, Northshore School District

This talk covers a ransomware attack on medium size school district (23K students, 4K staff). We start with the timeline of the attack that was determined by forensic analysis, cover what was damaged in the attack, and then cover the attack recovery process. Then we'll discuss changes that were made to avoid and mitigate any future attacks. We wrap up with the lessons learned during this attack in the hope that they will help you to avoid and recover quicker if you do experience a ransomware attack.

After this talk you will have a better understanding of how an attack happens, what kind of alerts may be symptoms of an attack, what to do in case you are attacked, what happens after you are attacked, and what actions you can take now to avoid and mitigate a ransomware attack.

Ski Kacoroski, Northshore School District

Ski has been done stints as a system admin, manager, consultant, professor, and emergency worker since 1982. He currently is a system admin at the Northshore School District in WA and an adjunct professor at Bellevue College. When not busy at a computer, Ski enjoys traveling, gardening, kayaking, and backpacking in the wonderful Pacific Northwest.

9:30 am–9:45 am (PDT)

Break

9:45 am–11:15 am (PDT)

Track I

"Disorganizing" Your SRE Organization

Tuesday, 9:45 am10:30 am

Leonid Belkind, CTO - StackPulse

More than a year ago, COVID-19 has presented us with a challenge on how to establish and grow our SRE practice under new conditions of working from home (while doubling and then tripling our team). Instead of trying to treat the situation in a "business as usual" manner, we embarked on a journey to reorganize our SRE practices, tools, and culture. In this talk, we will share the assumptions we started the process with, our learnings during the process, and the state we ended in, after more than a year of a change.

Leonid Belkind, CTO - StackPulse

Leonid Belkind is a Co-Founder and CTO at StackPulse, a Site Reliability Engineering orchestration platform. Prior to StackPulse, Leonid co-founded (and was CTO of) Luminate where he guided this enterprise-grade service from inception to widespread Fortune 500 adoption to acquisition by Symantec. Before Luminate, Leonid managed software development organizations at CheckPoint.

Through his career, Leonid has witnessed modern Software Engineering practices come and replace the traditional ones, first around Continuous Integration and Delivery pipelines, then Infrastructure Management and Monitoring, and onwards as software services have replaced on-premise products. Throughout this journey Leonid has become passionate about building reliability-first architectures, methodologies, and organizational culture.

Everything We Did Wrong to Do Accessibility Right at BuzzFeed

Tuesday, 10:30 am11:15 am

Plum Ertz, Ro, and Jack Reid, BuzzFeed

Over the past two years, BuzzFeed's main hub of "listicles" and quizzes has been picked apart by third-party auditors and mindfully reconstructed to meet WCAG 2.1 AA accessibility success criteria by its tech team. From day one, the remediation and certification process was a case study in building accessibility from solid moral grounds and shaky organizational ones. Our team is proud of our work, but we did quite a bit wrong to get it right. In this talk, we'll share what we've learned from our mistakes so that you can avoid the same ones on your own compliance journey, and set yourself up for long-term success in building accessible culture.

Plum Ertz, Ro

Plum Ertz (she/her) is a software engineer, baker of noms, and casual economist. Her main areas of expertise are on the front end, with a focus on accessibility and web standards. Plum is currently tackling accessibility and product growth in the patient experience as a Senior Engineering Manager at Ro.

Jack Reid, BuzzFeed

Jack Reid (he/him) is a software engineer living in South London. He spends most of his time worrying about accessibility or playing with his cat. He's currently Engineering Manager at BuzzFeed.

Track II

Exploiting Brain Vulnerabilities: Chaotic Good at Scale

Tuesday, 9:45 am10:30 am

Maia Sauren

Have you ever tried convincing your high school friends that they should use a password manager?

This talk is about how to change people's minds and successfully introduce new concepts to groups and individuals without damaging relationships.

Each of us travels across cultures—work, home, multiple friendship groups. We even travel across micro-cultures—adjacent work teams, different arms of the family. We navigate groups with distinctive communication norms, consent norms, political awareness, technical knowledge, and assumptions of morality—and we do it several times a day.

How do we do that successfully? More importantly, how do we bring new concepts to existing groups without damaging relationships? We do it by accepting that normal is different everywhere, and it doesn't have to remain the same over time.

We will cover change management theory and practice, organisational transformation techniques, complexity communication, and theories of mind. The audience will leave with concrete tools for enabling organisational and group behaviour change, and adaptive communication strategies for translating contexts and concepts across domains.

Maia Sauren[node:field-speakers-institution]

Dr. Maia Sauren is a program and project manager, strategist, delivery team lead, and analyst, specialising in healthcare and data-intensive domains. Maia's background includes a biomedical engineering Ph.D. and a variety of international roles as technical educator, science communicator, and data scientist, with work spanning data science and architecture strategy, from large-scale organisational transformation to healthcare analytics applications for specialised and low-resource settings.

Re-imagining Management Methods for Distributed and Clustered Systems with Kraken/Layercake

Tuesday, 10:30 am11:15 am

J. Lowell Wofford, Kevin Pelzel, and Travis Cotton, Los Alamos National Laboratory

The overarching design of cluster system management stacks has not changed in decades. Most existing tooling works the same: set up netboot, configure some system "images," power on, and hope for the best. This set-it-and-leave-it approach is inadequate as systems grow in size and complexity. Modern systems need robust ways to automate systems management and enforce system states over time.

We have been rethinking the tooling for clustered systems. We introduce a new framework for distributed system automation, "Kraken," as well as a Kraken-based provisioning toolkit, "Layercake." Together they provide distributed, stateful provisioning and automation across clustered systems. Immediate advantages include: scalably and reliably initializing clusters from bare metal; self-healing capabilities for (some) failures; continuous system state enforcement; automated changes to configurations, personalities, and node images (often in microseconds); all while being declarative, idempotent, modular & extensible. We will present both the Kraken/Layercake tooling and outline the core design principles.

J. Lowell Wofford, Los Alamos National Laboratory

J. Lowell Wofford is a scientist at Los Alamos National Laboratory in the HPC Design group. Over the past couple of decades, he has dabbled in many aspects of High-Performance Computing, from scientific algorithms to system design. Lowell's current work is on Cluster and Supercomputer design, including system hardware, high-speed networks, and system software architecture. Most recently, he has focused on novel ways to automate the management of very large distributed systems.

Kevin Pelzel, Los Alamos National Laboratory

Kevin is a scientist at Los Alamos National Laboratory. He graduated from the University of Wisconsin Stout in 2018 with a Bachelors in Computer and Electrical Engineering and immediately started work at LANL's HPC division, first as a post-bach, then as a staff scientist. Since then he's been working in the HPC environments group focused on developing tools for system management, such as automated system bring up and maintenance, syslog analysis, and data transfer utilities.

Travis Cotton, Los Alamos National Laboratory

Travis is a scientist at Los Alamos National Laboratory in the HPC division. He graduated from New Mexico State University in 2013 with a Masters in Computer Science and has been working in HPC in various roles, starting as a research assistant in his Master's program and throughout his career. He started working at LANL as a scientist in 2018 in the HPC systems group, where he focuses on production computing, configuration management, and cluster image building.

11:15 am–12:00 pm (PDT)

Break

12:00 pm–1:30 pm (PDT)

Track I

Kind Engineering: How to Engineer Kindness

Tuesday, 12:00 pm12:45 pm

Evan Smith, Solvemate

Software Engineering is an environment that is rife with worrying stereotypes like the "Brilliant Jerk" and the "Peter Principle." How do you be a kind engineer and encourage kindness in engineering? For the last couple years, Evan has been exploring what it means to be a Kind Engineer through books, media and interviews. It's called different things in different circles but ultimately it leads back to one fact: people who give more unconditionally make themselves happier, their teams happier and their companies happier.

This talk will cover practical tips for becoming a kinder engineer through the following topics: code reviews, psychological safety, giving/receiving feedback and honesty in the workplace.

Evan Smith, Solvemate

Evan Smith is a Site Reliability Engineer with the remote German company Solvemate and is responsible for managing the infrastructure, CI/CD, incident response and monitoring, as well as promoting a culture of kindness and learning.

More Performant Cluster State Management Using Open Source Firmware and a Kraken

Tuesday, 12:45 pm1:30 pm

Devon Bautista and J. Lowell Wofford, Los Alamos National Laboratory

Often, vendor-provided firmware is proprietary and closed, which can present some hurdles in high-performance computing (HPC). Vendor firmware usually provides a generic way for bootstrapping systems, having to accommodate for many situations, but purpose-built clusters would benefit from more purpose-built firmware. The ability to customize the system initialization more granularly would provide more control over the hardware. This could potentially increase boot efficiency and reduce boot times by eliminating unused features and introducing more useful ones, but proprietary firmware tends to limit the amount of fine tuning that is possible. This talk will demonstrate a use case for open firmware in the context of HPC with the integration of Kraken, a distributed state management tool focused on managing stateless HPC clusters. It will demonstrate how open firmware can be leveraged for eliminating nonnecessities in the boot process of nodes, as well as for provisioning them more reliably.

Devon Bautista, Los Alamos National Laboratory

Devon is a post-masters student at Los Alamos National Laboratory working under the New Mexico Consortium. He completed his Bachelor of Science in Computer Systems Engineering in 2019 and Master of Science in Computer Engineering in 2020, both at Arizona State University, and started working at LANL as a summer intern in 2019. He currently works in LANL's HPC design group, focusing on system initialization, management, and provisioning from a low-level perspective.

J. Lowell Wofford, Los Alamos National Laboratory

J. Lowell Wofford is a scientist at Los Alamos National Laboratory in the HPC Design group. Over the past couple of decades, he has dabbled in many aspects of High-Performance Computing, from scientific algorithms to system design. Lowell's current work is on Cluster and Supercomputer design, including system hardware, high-speed networks, and system software architecture. Most recently, he has focused on novel ways to automate the management of very large distributed systems.

Track II (Core Principles)

5 Years of Cgroup v2: The Future of Linux Resource Control

Tuesday, 12:00 pm12:45 pm

Chris Down, Facebook

Core Principles

Control groups (or cgroups for short) are one of the most fundamental technologies underpinning our modern love of containerisation and resource control. Back in 2016, we released a complete overhaul of how cgroups work internally: cgroup v2, released with Linux 4.5. This brought many new and exciting possibilities to increase system stability and throughput, but with those possibilities have also come challenges of a type which we have largely not faced in Linux before.

This talk will go into some of the challenges faced in overhauling Linux's resource isolation and control capabilities, and how we've gone about fixing them. This will include some of the most complex and counter-intuitive practical effects we've seen in production, with details of how our expectations and knowledge have developed over the last 5 years using this on over a million machines in production, with insights that are immediately applicable to anyone who runs Linux at scale.

We will also go over the state-of-the-art of resource control in the "real world" outside of companies like Facebook and Google, looking at how cgroup v2 is changing the technical landscape for distributions and containerisation technologies for the better.

Chris Down, Facebook

Chris Down is an engineer on the Facebook kernel team, primarily working on cgroups and overall memory management strategy. He is responsible for debugging and resolving major production issues and improving the reliability and efficiency of Facebook's systems, and is also a maintainer of systemd.

BPF Internals

Tuesday, 12:45 pm1:30 pm

Brendan Gregg

Core Principles

Extended BPF (aka eBPF) is a new type of software for secure, performant, event-driven programs, and has seen widespread adoption. Your Linux servers may already be running BPF programs; Netflix cloud instances run 15 by default, and Facebook over 40. These programs are for networking, performance tools, security policies, device drivers, application proxies, and more. Many have said that BPF is taking over Linux.

This talk is a deep dive that describes how BPF works internally and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.

Brendan Gregg[node:field-speakers-institution]

Brendan Gregg is an industry expert in computing performance and cloud computing. He is a senior performance architect at Netflix, where he does performance design, evaluation, analysis, and tuning. He is the author of Systems Performance and BPF Performance Tools (Addison-Wesley), and received the USENIX LISA Award for Outstanding Achievement in System Administration. Brendan has created numerous performance analysis tools, visualizations, and methodologies for performance analysis, including flame graphs.

1:30 pm–1:45 pm (PDT)

Break

1:45 pm–2:45 pm (PDT)

Lightning Talks

Wednesday, June 2

8:00 am–9:30 am (PDT)

Plenary Session

Computing Performance: On the Horizon

Wednesday, 8:00 am8:45 am

Brendan Gregg

The chase for higher performance in computing is pervasive: it is the driving reason for many new technologies and a common feature of updates. While we can expect incremental performance improvements to our existing software and hardware (with Moore's law for processors a well-known example), it is harder to predict new technologies. This talk discusses the current performance improvements that you will likely be adopting for processors, memory, disks, networking, runtimes, hypervisors, and more, as well as discussing where things are headed with predictions for new technologies. The future of performance is increasingly cloud-based with hardware hypervisors and custom processors, meaningful observability of everything down to cycle stalls (even as cloud guests), and high-speed syscall-avoiding applications that use BPF, FPGAs, and io_uring.

Brendan Gregg[node:field-speakers-institution]

Brendan Gregg is an industry expert in computing performance and cloud computing. He is a senior performance architect at Netflix, where he does performance design, evaluation, analysis, and tuning. He is the author of Systems Performance and BPF Performance Tools (Addison-Wesley), and received the USENIX LISA Award for Outstanding Achievement in System Administration. Brendan has created numerous performance analysis tools, visualizations, and methodologies for performance analysis, including flame graphs.

Performance Analysis of XDP Programs

Wednesday, 8:45 am9:30 am

Zachary H. Jones, Verizon Media

One of the many opportunities presented by BPF is the ability to move network processing down from the higher levels of the kernel, closer to the hardware. This allows for manipulating, dropping, and transmitting packets without the costs of going through the full kernel networking stack. However, doing so introduces fresh complexity and results in significantly reduced visibility. Therefore, in order to develop performant systems, care must be taken in analyzing and measuring system performance. Here, we outline our current approach which accommodates these new challenges: visualize flame graphs to visualize processing budgets, microbenchmark eBPF helpers and features, and viewing annotated assembly code with utilization percentages.

Zachary H. Jones, Verizon Media

Zachary Jones is a performance and kernel engineer at Verizon Media, where he does performance measurement, analysis, and tuning along with systems and performance architecture of the Verizon Media Platform CDN. Zach received his Ph.D. from Clemson University in 2010. Since then, he has gained over 10 years of performance and kernel engineering experience with previous roles at IBM and NetApp.

9:30 am–9:45 am (PDT)

Break

9:45 am–11:15 am (PDT)

Track I

Building Community with CentOS Stream

Wednesday, 9:45 am10:30 am

Davide Cavalca, Facebook

With the introduction of CentOS Stream, it is now possible to contribute to CentOS directly. This talk will go over Facebook's experience working with CentOS (the distro, the project, the community), growing from consumer, to contributor, to founding member of the new Hyperscale SIG, which strives to facilitate collaboration around large-scale CentOS deployments.

Davide Cavalca, Facebook

Davide Cavalca is a Production Engineer at Facebook, where he is currently leading the fleet migration to CentOS Stream 8. Davide has been working in the systems space for over 10 years, always with a strong focus towards open source and automation.

How to Apply GitOps on Infrastructure (for Real)

Wednesday, 10:30 am11:15 am

Viktor Farcic, Shipa

When managing infrastructure today, we have a few requirements beyond the obvious ones, such as defining everything as code.

  • It should be based on GitOps principles
  • It should detect drifts automatically and converge the actual (infra) into the desired (Git) state automatically
  • It should be based on a common API, potentially the same one we are using for managing other states (e.g., applications)

In this session, we will explore Crossplane combined with Argo CD as a potential solution for all our infrastructure needs and see whether it fits into the broader Kubernetes ecosystem.

Viktor Farcic, Shipa

Viktor Farcic is a Developer Advocate at Shipa, a member of the Google Developer Experts and Docker Captains groups, and a published author.

He is a host of the YouTube channel "DevOps Toolkit" and a co-host of "DevOps Paradox." He published "The DevOps Toolkit Series" and "Test-Driven Java Development."

Track II

Organizational Design for Technical Emergency Response in Distributed Computing Systems

Wednesday, 9:45 am10:30 am

Adrienne Walcer and Alexander Perry, Google Inc.

When a company critically relies on the ongoing functioning of a complex and highly interconnected technical stack, support of that stack implies that appropriate personnel be reliably available to troubleshoot and correct issues that occur. These personnel will be referred to as responders. When the scope of a technical stack grows beyond one person's capacity to understand and maintain state, we split up the technical stack such that multiple responders can each provide coverage on a single component of the whole stack. Such a highly interconnected system-of-systems (SoS) allows production issues to cascade throughout wide swaths of the SoS, or sneak in between system-to-system (StS) boundaries. We will here explore one private industry implementation of a responder group designed to respond to emergent distributed computing SoS failures. In contrasting the functions of component responders and SoS responders, we demonstrate that the component ownership skillset is distinguishable from the core skill set of an SoS responder. Technical organizations can benefit from setting up SoS response to enable expedient distributed system outage mitigation.

Adrienne Walcer, Google Inc.

Adrienne has been at Google for 8 years, currently as a Technical Program Manager in Site Reliability Engineering (SRE). She is the program lead for Incident Management, and focuses on the lifecycle of large scale emergencies. Before Google, Adrienne was a Data Scientist at Explorys Inc. She studied Biostatistics at the University of Rochester and is currently pursuing a Master of Science in Systems Engineering at George Washington University.

Alexander Perry, Google Inc.

Dr. Perry received his Ph.D. and Masters in Engineering from the University of Cambridge, England, completing research to develop new techniques for precision electromagnetic characterization of superconductors. He has worked at Google for 15 years as a Staff Site Reliability Engineer (SRE) on high performance network technologies and their associated security systems. He currently leads testing programs in support of disaster resiliency.

Groove with Ambiguity: The Robust, the Reliable, and the Resilient

Wednesday, 10:30 am11:15 am

Matt Davis, Blameless

The networked software systems we build are increasing in complexity every moment. Today the most successful builders and operators are embracing complexity through CI/CD, Chaos Engineering, and innovation in Incident Response. They realize that the adaptive world around us is advancing at such a breakneck speed, it is leaving our capacity to understand it in the dust. That humans and technology must race a gauntlet of automation surprises and collaboration challenges as a team, learning and improving along the way. This session showcases methods of deploying, running, and navigating complexity. It offers a practical view of how software systems can scale and remain robust to failure (like fallbacks or high availability), achieve highly reliable socio-technical operations (via runbooks and game days), and adapt to surprise through techniques of resilience engineering (graceful extensibility and building for adaptation).

Matt Davis, Blameless

Just as at home with analog synthesizer electronics as with Infrastructure as Code, Matt Davis finds joy in operating chaotic complex systems. His variegations include data-center operations, storage hardware, distributed databases, network security, site reliability engineering, NOC support, observability systems, and techops leadership. Matt's passion for exploring the relationships between the artistic mind and operating distributed software architectures is reflected in his themes for both technology talks and musical output, seeking out diverse ways to learn from our adaptive universe.

11:15 am–12:00 pm (PDT)

Break

12:00 pm–1:30 pm (PDT)

Track I

Sustainable Software Engineering

Wednesday, 12:00 pm12:45 pm

Bill Johnson

An SREs primary goal is to balance the technical and operational aspects of a system to drive reliability. Reliability and sustainability are tightly coupled which means SREs are uniquely positioned to balance a 3rd area: Environmental sustainability. This talk will detail these 3 areas, how it relates to SREs, as well as provide some tangible examples of what you can do in your team today to be a champion of Technical, Operational, and Environmental sustainability efforts.

AppSec Fundamentals for Modern DevOps

Wednesday, 12:45 pm1:30 pm

Suchakra Sharma and Vickie Li, ShiftLeft Inc.

The complexity of modern applications and their deployments means that DevOps needs to wear the security hat from time to time. AppSec knowledge can help DevOps engineers plan their deployments, contingency plans and communicate more effectively with the Security and Development team.

In this talk, we will introduce the principles of application security. We will first talk about the industry-standard OWASP Top 10 vulnerabilities. We will then discuss the secure development lifecycle and how to implement security measures in each step. Finally, we will talk about how security teams can build an AppSec program in their organization to continuously improve their security posture.

Suchakra Sharma, ShiftLeft Inc.

Suchakra Sharma is Staff Scientist at ShiftLeft Inc. where he builds code analysis tools and and hunts security bugs. He completed his Ph.D. in Computer Engineering from Polytechnique Montréal where he worked on eBPF technology and hardware-assisted tracing techniques for OS analysis. As part of his research, he also developed one of the first hardware-trace based virtual machine analysis techniques. He has delivered talks and trainings at venues such as RSA, USENIX LISA, SCALE, Papers We Love, Tracing Summit, etc. When not playing with computers, he hikes and writes poems.

Track II (Core Principles)

Protecting System Integrity with Trusted Platform Module

Wednesday, 12:00 pm12:45 pm

Dmitrii Potoskuev, Facebook

Core Principles

Every software and firmware component running on a system can be the vector for delivering an attack to the host itself and the wider infrastructure around it. We often focus on protecting the system from what runs in user space or kernel space, and we don't always include in our threat model the integrity of the lower layers in the stack. In this talk, we want to show what could be the impact of compromising a host through a persistent implant in its system firmware. We will focus specifically on UEFI, the industry-wide standard that defines how system firmware should operate. We will demonstrate a "hello-world" system firmware malware from its development to its injection on the host. We will then introduce the concept of Trusted Platform Module, a secure cryptoprocessor that has become an industry standard on consumer and enterprise systems, and explain how the TPM can help protect the platform from our demonstrative malware. We will assume that our system requires secrets to be able to interface with the infrastructure around and we will leverage the TPM to give the host access to those secrets only if we can guarantee that all layers of the stack have not been compromised.

Dmitrii Potoskuev, Facebook

Dmitrii is a Production Engineer at Facebook focusing on Trusted Platform Module applications and related server life cycle workflows. Previously Dmitrii worked in Telecom, System Integration and Retail industries as a software developer and solution engineer.

The Cornerstone for Cybersecurity—Cryptographic Standards

Wednesday, 12:45 pm1:30 pm

Lily Chen

Core Principles

This presentation will introduce NIST Cryptographic Standards and their applications in cybersecurity. The presentation will also discusses transitions and validations. It highlights challenges and solutions for next generation cryptographic standards, including challenges to deal with quantum threats, new cryptography transition, and lightweight cryptography for constrained devices.

Lily Chen[node:field-speakers-institution]

Dr. Lily (Lidong) Chen is a mathematician and heads Cryptographic Technology Group in Computer Security Division, NIST. Her team has been developing cryptographic standards published in Federal Information Processing Standards (FIPS) and NIST Special Publications (SP). The team is currently devoted to developing next generation of cryptography standards, including post-quantum cryptography, lightweight cryptography for constrained environment, and approaches many advanced cryptographic areas.

1:30 pm–1:45 pm (PDT)

Break

1:45 pm–2:45 pm (PDT)

Popcorn Talks

Popcorn talks are informal, short, silly, and fun talks! Speakers will be given a surprise set of slides and will have 5 minutes max (feel free to do less) to adlib a short talk based on their contents. Expect lots of GIFs, memes, and extremely silly slides, which may or may not be related to tech. A good example of popcorn talks is from DevOpsDays Columbus.

While these talks are obviously unscripted, speakers are still expected to follow the USENIX Code of Conduct, so make sure to review it before presenting.

Sign-ups are first-come, first-serve until we run out of time in the session! If you're interested in doing a popcorn talk, please sign up before May 28.

Thursday, June 3

8:00 am–9:30 am (PDT)

Track I

Crypto Agility: Adapting and Prioritizing Security in a Fast-Paced World

Thursday, 8:00 am8:45 am

Chujiao Ma, Comcast Cable Communications, LLC

Crypto agility refers to the ability to replace existing crypto primitives, algorithms, or protocols with a new alternative quickly and inexpensively, with no or acceptable risk exposure. These changes may be driven by regulatory action, advances in computing, or newly discovered vulnerabilities. Yet everyday operational needs may put crypto agility considerations on the back burner when deploying technology, designing processes, or developing products/services. Consequently, changes are often performed in an ad hoc manner. Transition from one crypto solution to another can then take a long time and expose organizations to unnecessary security risk. This presentation presents a framework to analyze and evaluate the risk that results from the lack of crypto agility. The proposed framework can be used by organizations to determine an appropriate mitigation strategy commensurate with their risk tolerance. We demonstrate the application of this framework with a case study of quantum computing threats to cryptography.

Chujiao Ma, Comcast Cable Communications, LLC

Chujiao Ma is a security research and development engineer at Comcast. Her research includes a wide range of topics from de-identification of data, crypto agility, open source, and quantum computing to security metrics. Chujiao holds a Ph.D. in Computer Science & Engineering from University of Connecticut and a Bachelor degree in Electrical and Computing Engineering from Franklin W. Olin College of Engineering.

The Remote Working Security Conundrum: What Is Reasonably Secure Anyway?

Thursday, 8:45 am9:15 am

Alex Sharp, OrionVM

Securing a remote work environment presents a unique set of challenges: from possibly adversarial networks and insecure physical environments through to authentication and segmentation hurdles. This talk runs through the stack, from physical security through to the application layer, highlighting interesting, important, or novel technology choices we made and the reasoning behind them. Examples include the use of QubesOS to allow virtualization to segment potentially sensitive web browsers, or the heads firmware image to secure the boot environment via the TPM.

Alex Sharp, OrionVM

Alex Sharp is the CTO and cofounder of OrionVM, a high performance wholesale cloud computing provider. He was one of the Forbes 30 under 30 in 2016, a Hills Young Innovator of the Year, and is an ardent supporter of the Mars Society, believing resolutely that humanity must become a multi-planetary species.

Track II

Why You Should Burn Down Your Datacenter

Thursday, 8:00 am8:45 am

Mike Elkin, Facebook

We're quite familiar with managing resources like CPU, memory, bandwidth, storage, I/O—but hidden underneath all that is the power, space, and cooling systems. Pushing the limits of these underlying resources means building a connection between the datacenter facilities & your infrastructure management services. Optimizing requires understanding the very unique industrial controls environments and working around the many issues they have. Come listen to a completely rational tale on why we should go on such adventures, and why we'll be so unhappy with them we want to burn the datacenter to the ground.

Mike Elkin, Facebook

Mike has been a Production Engineer at Facebook for over 8 years, working across a number of different infrastructure systems and spaces. This includes work on kernel upgrade automation, active network health monitoring, rack maintenance service frameworks, all of which have been now been deprecated. Besides writing deprecated solutions he also enjoys the occasional Internet spaceships/spreadsheets simulator, Irish whiskeys, and cheese.

Selectively Sharing Multipath Routes in BGP

Thursday, 8:45 am9:15 am

Trisha Biswas, Fastly, Inc.

Border Gateway Protocol (BGP) is the most widely used network protocol to distribute routing information between network service providers. Traditionally, BGP speakers propagate only the best path to a given address prefix over a session. This achieves better scaling at the cost of path diversity. BGP add-paths (RFC7911) allows sharing of multiple paths for the same prefix, helping achieve faster re-convergence. In today's networks, simply enabling add-paths however, could result in sharing millions of routes with a peer, potentially overwhelming it.

Selective advertisement of multiple paths allows a BGP speaker to pick additional paths to share, based on other route attributes. This extension, currently an IETF draft, helps unlock the potential of purposefully sending additional routes, as opposed to bombarding the peer with millions of extra routes. In this talk, we will present a use-case for the application of selective add-paths, along with a working demonstration of the feature.

Trisha Biswas, Fastly, Inc.

Trisha is a networking researcher with several years of experience in cloud network infrastructure and routing design. She works as a senior software engineer at Fastly, Inc., a company that provides an edge cloud platform for faster content delivery. Her current work involves design of the network control plane in edge cloud networks, where routing decisions are made to optimize traffic flow. She obtained her Ph.D. from North Carolina State University, focusing on resilient routing protocols for wireless ad hoc networks. In her spare time, she likes to travel, hike, backpack and sing.

9:30 am–9:45 am (PDT)

Break

9:45 am–11:15 am (PDT)

Track I

Hands-Off Testing for Networked Filesystems

Thursday, 9:45 am10:30 am

Daria Phoebe Brashear, AuriStor, Inc.

Cross-platform network filesystems require testing, but in-kernel interface testing is problematic under the best of circumstances. This talk will discuss the techniques used at AuriStor for automating hands-off testing using buildbot, TAP, docker, and kvm.

Daria Phoebe Brashear, AuriStor, Inc.

Daria has been working with distributed network filesystems since attending Carnegie Mellon as an undergraduate. She writes software for money, stories and poetry for fun, bikes for pleasure, and chases trains.

Leveraging AFS Storage Systems to Ease Global Software Deployment

Thursday, 10:30 am11:15 am

Tracy J. Di Marco White, Goldman Sachs

Using AFS as both a file store and an object store, we provide software to hundreds of thousands of client systems within both public and private cloud. As we see a continual increase in the frequency of software deployments, in the number of different software packages, and in the number of versions of each software package, we have also adapted our software deployment systems. Both of our software deployment systems use AFS, but one is unaware of AFS, and one makes specific use of various AFS features. I'll cover how the infrastructure has grown from several private data centers, and how our use of AFS has eased migration to both private and public cloud. I'll discuss the changes we are making to both the AFS-unaware and AFS-aware deployment systems, as well as discuss bugs, bottlenecks, and patterns of software development and usage that we've discovered through the change process.

Tracy Di Marco White, Goldman Sachs

Tracy has been herding infrastructure since before graduating from college, where she learned to use, then manage, distributed systems. After spending a couple decades as part of Iowa State University's IT organizations, she accepted a position at Goldman Sachs, where she has even more infrastructure to herd. An even longer time home automation fan, lately she's taken to voice assistants for telling her home automation systems what to do.

Track II

Year One: Transitioning From Application Engineer to Infrasec Engineer

Thursday, 9:45 am10:30 am

Misty Hall, TrussWorks

No one emerges from the womb an Infrastructure Engineer (except maybe John AllSpaw), and the idea of which skills gaps constitute acceptable hiring risk are highly contested. After a year of struggles, triumphs, and self-reflection, I humbly present my context as a case study in the Sisyphean task of remedying skills gaps in a field where lifelong-learning is a given.

Misty Hall, TrussWorks

Misty Hall is an Infrastructure Engineer at TrussWorks. She landed in the Infrasec practice by way of Application Engineering in industries ranging from healthcare to Bitcoin fintech. She likes terraform and alligators.

SkillOps: Real-World Approaches in Skilling and Building World-Class Security & Technology Teams for a Remote-First World

Thursday, 10:30 am11:15 am

Abhay Bhargav, we45

2020 has accelerated a few trends.

One—Remote/Hybrid working is here to last. Most companies are planning for it.

Two—Organizations are increasingly and relentlessly moving to the cloud and more distributed computing setups with wider adoption of Cloud-Native technologies.

Three—A Continuous flow of security exploits and incidents. This is not a new trend, but the rise of supply-chain attacks (Solarwinds) and other exploits is only going to make security operations harder without the right people.

The Security industry needs people will well-rounded skills in offensive and more importantly defensive and implementation security practices.

In this talk, I hope to highlight the key success factors in building and fostering an environment of continuous learning, autonomy, and deliberate practice, that companies can leverage, to build great security teams. I will detail approaches and anecdotes from my own experience that underscore these points.

Abhay Bhargav, we45

Abhay Bhargav is the Founder of we45, a focused Application Security Company and the Chief Research Officer of AppSecEngineer, an elite, hands-on online training platform for AppSec, Cloud-Native Security, Kubernetes Security and DevSecOps.

He has created some pioneering works in the area of DevSecOps and AppSec Automation, including the world's first hands-on training program on DevSecOps, focused on Application Security Automation.

Abhay is a speaker and trainer at major industry events including DEF CON, BlackHat, OWASP AppSecUSA, EU and AppSecCali. His trainings have been sold-out events at conferences like AppSecUSA, EU, AppSecDay Melbourne, CodeBlue (Japan), BlackHat USA, SHACK and so on.

11:15 am–12:00 pm (PDT)

Break

12:00 pm–1:30 pm (PDT)

Track I

It's Time to Debloat the Cloud with Unikraft

Thursday, 12:00 pm12:45 pm

Felipe Huici, NEC Laboratories Europe GmbH

Cloud computing has revolutionized the way we think about IT infrastructure: Another web server? More database capacity? Resources for your artificial intelligence use case? Just spin-up another instance and you are good to go. That's the hype, but the reality is that the cloud, whether public or private, is severely bloated: instances can run up to GBs in size, "fast" boot times can take in the order of minutes, and memory consumption for running a single simple service (e.g., a static content web server) can be exorbitant. In this talk, we will present the Linux Foundation Unikraft project, a build tool and framework for seamlessly generating extremely efficient yet high performance cloud-ready images that are each tailored to the needs of specific applications. Our evaluation using off-the-shelf applications such as nginx, SQLite, and Redis shows that running them on Unikraft results in a 1.7x-2.7x performance improvement compared to Linux guests. Unikraft is a Linux Foundation open source project and can be found at www.unikraft.org.

Felipe Huici, NEC Laboratories Europe GmbH

Felipe is a chief researcher at NEC Laboratories Europe in Heidelberg, Germany. His main research and interests lie in the areas of high-performance software systems, and in particular specialization, virtualization and security. He has been published in several top-tier conferences and journals such as SOSP, Eurosys, SIGCOMM, NSDI, CoNEXT, and SIGCOMM CCR and regularly act as TPC member of conferences and journals such as IMC, INFOCOM, CoNEXT and SIGCOMM CCR.

Can Infrastructure as Code Apply to Bare Metal?

Thursday, 12:45 pm1:30 pm

Rob Hirschfeld, RackN

Infrastructure as Code (IaC) uses source control, CI/CD and immutable configuration to build cloud infrastructure. What does it take to apply these concepts to bare metal too? We'll discuss the intersection of IaC practices with the realities physical infrastructure automation. We'll review production proven techniques that build secure and resilient infrastructure.

Rob Hirschfeld, RackN

Do you keep wondering why building automation is so hard and even harder to share as a community? That's really bugs Rob too. He has been creating software to collaboratively automate infrastructure for over 20 years. His latest startup, RackN, focuses on providing Distributed IaC automation and abstraction layers for provisioning Cloud, Edge and Enterprise data centers. He is also building a forward looking operator community at the2030.cloud with weekly DevOps and future hallway-type discussions.

Track II

Service Mesh Up and Running with Linkerd

Thursday, 12:00 pm12:45 pm

Charles Pretzer

A service mesh is an abstraction layer that binds together microservices-based distributed systems by providing observability, security, and reliability.

This talk will take that description, remove the jargon, and use a demo to show you the power of a service mesh using Linkerd.

Charles Pretzer[node:field-speakers-institution]

Charles Pretzer is a field engineer at Buoyant, where he spends his time collaborating and engaging with the open source community of the CNCF service mesh, Linkerd. He also enables production level adoption by helping companies integrate Linkerd into their Kubernetes-based applications. Charles has spoken at meetups and conferences hosted by ABN Amro, Macnica, and at the NGINX Conference. When he's not presenting or hacking away at his computer, he's riding a motorcycle or making a delicious mess in the kitchen.

The What and Why of Documenting Your Infrastructure

Thursday, 12:45 pm1:30 pm

Kevin Metcalf, Foothill-De Anza Community College District

When our only Linux admin for 70+ physical and virtual machines announced his retirement, the job of managing this undocumented mess fell to me. This talk is a humorous look at how I ended up over my head, and what I did about it. Folks should come away thinking about documentation in terms of respect and equity.

1:30 pm–1:45 pm (PDT)

Break

1:45 pm–2:35 pm (PDT)

Closing Remarks

Program Co-Chairs: Carolyn Rowland, National Institute of Standards and Technology (NIST), and Avleen Vig, Facebook

Closing Talk

Practical Kubernetes Security Learning using Kubernetes Goat

Thursday, 1:45 pm2:35 pm

Madhu Akula, Miro

Kubernetes Goat is "vulnerable by design" Kubernetes Cluster environment to practice and learn about Kubernetes Security. In this session, Madhu Akula will present how to get started with Kubernetes Goat by exploring different vulnerabilities in Kubernetes Cluster and Containerized environments. Also, he demonstrates the real-world vulnerabilities and maps the Kubernetes Goat scenarios with them. We will see the complete documentation and instruction to practice Kubernetes Security for performing security assessments. As a defender you will see how we can learn these attacks, misconfigurations to understand and improve your cloud native infrastructure security posture.

Madhu Akula, Miro

Madhu Akula is the creator of Kubernetes Goat, an intentionally vulnerable by design Kubernetes Cluster to learn and practice Kubernetes Security. Also published author and Cloud Native security architect with extensive experience. Also, he is an active member of the international security, DevOps, and Cloud Native communities (null, DevSecOps, AllDayDevOps, etc). Holds industry certifications like OSCP (Offensive Security Certified Professional), CKA (Certified Kubernetes Administrator), etc. Madhu frequently speaks and runs training sessions at security events and conferences around the world including DEFCON (24, 26 & 27), BlackHat USA (2018 & 19),USENIX LISA (2018 & 19), O’Reilly Velocity EU 2019, GitHub Satellite 2020, Appsec EU (2018 & 19), All Day DevOps (2016, 17, 18, 19 & 20), DevSecCon (London,Singapore, Boston), DevOpsDays India, c0c0n(2017, 18), Nullcon (2018, 19), SACON 2019, Serverless Summit, null and multiple others. His research has identified vulnerabilities in over 200+ companies and organizations including; Google, Microsoft, LinkedIn, eBay, AT&T, WordPress, NTOP and Adobe, etc, and credited with multiple CVE’s, Acknowledgements, and rewards. He is co-author of Security Automation with Ansible2 (ISBN-13: 978-1788394512), which is listed as a technical resource by Red Hat Ansible, and Technical reviewer for Learn Kubernetes Security book. Also won 1st prize for building Infrastructure Security Monitoring solution at InMobi flagship hackathon among 100+engineering teams.