{SRE} in the Small and in the Large

Niall Murphy; Todd Underwood

A variety of topics are being covered at LISA16. Use the icons listed below to focus on a key subject area:

Follow the icons throughout the Conference Program below. You can combine days of training or workshops with days of Conference Program content to build the conference that meets your needs. Pick and choose the sessions that best fit your interests—focus on just one topic or mix and match.

LISA16 Program Grid

Download the program in grid format (PDF). (updated 12/7/16)

Downloads for Registered Attendees

Attendee Files

LISA16 Attendee List (PDF)

Jump to: Wednesday | Thursday | Friday

Wednesday, December 7, 2016

7:30 am-5:00 pm

On-Site Registration and Badge Pickup

Grand Ballroom Foyer

7:30 am–9:00 am

Continental Breakfast

Grand Ballroom Foyer

9:00 am–9:15 am

Opening Remarks

LISA16 Co-Chairs: John Arrasjid, Dell EMC, and Matt Simmons, SpaceX

Constitution Ballroom

9:15 am–10:30 am

Keynote Address

Constitution Ballroom

The Evolution of People, Process, and Technology in the Digital Age

Wednesday, 9:15 am–10:30 pm PST

John Roese, Dell EMC EVP and CTO of Cross Product Operations

Available Media

As the IT industry is evolving faster into the Digital Age, there are multiple facets being affected throughout the industry including people's roles in IT, the technology needed to support the newer applications and use cases, and even existing IT oriented processes. We will discuss the impact that this IT transformation is having on these aspects of the industry and you.

John Roese, Dell EMC, EVP & CTO of Cross Product Operations

John Roese is both Executive Vice President and Chief Technology Officer at EMC Corporation. He is responsible for defining its technology vision and strategy. Mr. Roese plays a key role in shaping EMC’s technology strategy as it embarks on its next phase of growth as it merges with Dell Computers. Mr. Roese is Founding Partner of ICT Advisory Group LLC. He formerly held executive level positions at Huawei Technologies, FutureWei, Nortel Networks, Broadcom Corporation, Enterasys Networks, Inc., and Cabletron Systems. He has 20 granted and pending patents. John holds a Bachelor of Science degree in Electrical and Computer Engineering from the University of New Hampshire.

10:30 am-11:00 am

Break with Refreshments

Grand Ballroom Foyer

11:00 am–12:30 pm

Talks I

Constitution Ballroom A

UAVs, IoT, and Cybersecurity

Wednesday, 11:00 am–11:45 am PST

David Kovar

Available Media

Small Unmanned Aerial Systems (sUAS) aka “drones” are all the rage—$500 UAVs are used in professional racing leagues and major corporations are building $100,000 UAVs to deliver packages and Internet connectivity. UAVs are slowly working their way into almost every commercial sector via operations, sales, manufacturing, or design.

sUAS—emphasis on the final "S"—are complex systems. The aerial platform alone often consists of a radio link, an autopilot, a photography sub-system, a GPS, and multiple other sensors. Each one of these components represents a cybersecurity risk unto itself and also when part of the larger system. Add in the ground control stations, the radio controller, and the video downlink system and you have a very complex computing environment running a variety of commercial, closed source, open source, and home brew software.

And yes, there is already malware specifically targeting drones.

During this presentation, we will walk through a typical operational workflow for a UAV, all of the components of a representative system, and through a possible risk assessment model for UAVs. Even if you are not working with UAVs, you should consider that UAVs are an instance of "the Internet of Things"—a collection of sensors and computing devices connected to each other and to the cloud designed to gather, distribute, and analyze data in a semi- or fully-autonomous manner.

David Kovar

David Kovar was recently a cyber security and incident response leader for a major consulting firm. He shifted focus to disruptive technologies and is currently pursuing a Master’s degree in International Affairs while consulting on UAVs. He runs a commercial UAV company that provides disaster response, precision agriculture, surveying, and other aerial imaging services. He’s also been an entrepreneur, ediscovery consultant, software engineer, search and rescue incident commander, executive protection agent, and lethal forensicator. He’s collected images in China, rescued wayward Americans in Australia, fenced with APT actors from all over the world, and led a mission to Tajikistan to evaluate the emergency preparedness of many local agencies. Oh, and he flies sailplanes, fixed wings, helicopters, and drones.

Stop Lying to Your Customers—the Cloud Is Neither Private Nor Secure: What Your Customers Need to Do for Privacy and Security, and How You Can Help Them

Wednesday, 11:45 am–12:30 pm PST

James "Brad" Whitehead, Chief Scientist, Formularity

Available Media

While we assure our customers and clients that the cloud is "safe," we are fooling both them and ourselves. In a typical cloud service, we send information through Transport Security Layer (TLS) ["SSL"] or Virtual Private Networks (VPNs); store it in encrypted databases; process it on dedicated virtual machines; and often send results back by TLS or VPNs. We follow the best practices of both the Healthcare Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry (PCI) communities: "encryption in motion," and "encryption at rest." We point out how these services and protocols protect the sensitive health, financial, and personal information of our customers. In truth, this cloud-based information lifecycle leaks sensitive information like a sieve! The worst part is that, as cloud architects and providers, we know it! We just like to gloss over it and pretend it's "somebody else's problem" (points to anybody that remembers Douglas Adam's "Life, The Universe, and Everything" and the 'SEP Screen' [http://hitchhikers.wikia.com/wiki/Somebody_Else's_Problem_field]). In the best practices, we talk about the state of "data in motion" and the state of "data at rest." So what happens during the state transition (from motion to rest)? We know that the data becomes visible, human-readable plain text. This is just one of at least five different places where "data in motion" can be decrypted, intercepted, and recorded during a normal TLS (SSL) connection. A similar set of problems exist with storing and processing sensitive information in databases and services in the cloud.

In the same manner we have specified TLS connections in the past to protect data in motion, we can now specify end-to-end encryption to protect sensitive information as it flows in and out of TLS, VPN, and Virtual Local Area Network (VLAN) pipes. By using the newly emerging technology of homomorphic encryption, we can store AND PROCESS encrypted information in the cloud, without ever decrypting it. Not only does this truly provide the type of protect we have led our customers to believe is currently present in the cloud, but it also relieves us, as cloud providers, from tremendous risk and liability. If the cloud provider never has access to the information being processed in their data center, they can't be held responsible for any breaches or hacks. How much is this liability? Well, a year of credit monitoring, a common compensation for loss of Personal Identifiable Information (PII), is approximately $50 per person. Lose 10 million records (and 10 million wouldn't even make it to the top ten breaches last year), and you're looking at a liability of half a billion dollars.

Technologies like end-to-end encryption, homomorphic encryption, always-encrypted databases, and re-encryption proxies are not proprietary technologies. They are available from multiple commercial and open source providers. We just need to start using them as the new standards in "best practices" to provide our customers and stockholders with the safety and privacy they think they already have.

James "Brad" Whitehead, Chief Scientist, Formularity

Brad Whitehead is Chief Scientist for Formularity, an electronic forms company dedicated to the secure collection and processing of personal information. Formerly, he was a Partner and Master Technology Architect with Accenture. Brad has architected and implemented numerous national-scale information processing systems, and served as an IT security advisor to several US Federal agencies. Brad holds a BS in Artificial Intelligence from Carnegie Mellon University and an MS in Information Technology from the University of Liverpool. He can be reached at brad.whitehead@formularity.com

Talks II

Constitution Ballroom B

Building a Billion User Load Balancer

Wednesday, 11:00 am–11:45 am PST

Patrick Shuff, Facebook

Available Media

Want to learn how Facebook scales their load balancing infrastructure to support more than 1.3 billion users? We will be revealing the technologies and methods we use to global route and balance Facebook's traffic. The Traffic team at Facebook has built several systems for managing and balancing our site traffic, including both a DNS load balancer and a software load balancer capable of handling several protocols. This talk will focus on these technologies and how they have helped improve user performance, manage capacity, and increase reliability.

Patrick Shuff, Facebook Inc.

Patrick Shuff is a production engineer on the traffic team at Facebook. His team's responsibilities include maintaining/monitoring the global load balancing infrastructure, DNS, and our content delivery platform (i.e. photo/video delivery). Other roles at Facebook include being on the global site reliability team where he works with various infrastructure teams (messaging, real time infrastructure, email) to help increase service reliability and monitoring for their services.

Sensu and the Art of Monitoring

Wednesday, 11:45 am–12:30 pm PST

Sean Porter, Sensu

Available Media

As the complexity and scale of our systems increases, our methods of monitoring them must adapt in order to maintain situational awareness. Our newfound ability to produce and deploy software at a high rate demands increased visibility into production.

We commonly use the term "monitoring" to describe a collection of components that make up a system comprised of fault detection, metric collection, analytics, visualization, and notification. Traditional monitoring tools tend to be monolithic in design with their own approach to each of these components. An alternative to the monolith is a composable system, a modular design, combining several best-of-breed tools to fulfill the functions of each component.

Sensu is an open source monitoring tool designed for today’s systems. Sensu is commonly referred to as "the monitoring framework," allowing its users to compose a monitoring system to meet their unique demands. Sensu provides a monitoring agent, transport, event processor, HTTP API, and more!

In this presentation, I will discuss each component of a modern monitoring system, comparing several approaches to each of them. I will also cover the advantages and disadvantages of specific tool architectures, talk about Sensu's approach to monitoring, and Sensu at scale!

Sean Porter, Sensu

Sean Porter is a toolsmith with a love for composable systems. He is a practitioner of passion driven development with an appetite for a good post-mortem. As the author and lead developer of Sensu, the open source monitoring framework, he acts as the primary caretaker of the project. Sean is a partner at Heavy Water Operations, building Sensu Enterprise and helping people automate and monitor their infrastructure.

Talks III

Back Bay Ballroom AB

Modern Cryptography Concepts: Hype or Hope

Wednesday, 11:00 am–11:45 am PST

Radia Perlman, Dell EMC Corporation

Available Media

There are many topics that get a lot of press, some that are the focus of many academic papers but have not escaped into the popular press, and others that are covered in both. It is important to be able to approach these things with skepticism. Some of the ones in academic literature are so hard to read that even though they might be interesting, it would be hard for anyone outside of academia to understand. Some (like homomorphic encryption), have escaped into popular media, but without the true understanding of how inefficient they might be.

Radia Perlman, Dell EMC

Radia Perlman is a Fellow at EMC. She has made many contributions to the fields of network routing and security protocols including robust and scalable network routing, spanning tree bridging, storage systems with assured delete, and distributed computation resilient to malicious participants. She wrote the textbook Interconnections, and cowrote the textbook Network Security: Private Communication in a Public World. She holds over 100 issued patents. She has received numerous awards, including induction into the Inventor Hall of Fame, induction into the Internet Hall of fame, election to National Academy of Engineering, and lifetime achievement awards from ACM's SIGCOMM and from USENIX. She has a Ph.D. in computer science from MIT.

Strategic Storytelling

Wednesday, 11:45 am–12:30 pm PST

Jessica Hilt, University of California, San Diego

Available Media

We often want to convince someone to approve a budget item or understand a decision but fail to persuade them with data alone. "Storytelling" is becoming a hot, new business term but unless you’re launching a Kickstarter campaign, it might be hard to understand how it applies to day-to-day business. This talk explains IT storytelling but fills in the gap between "buzzword" and practical application by providing step-by-step advice on how to use storytelling in projects, meetings, and interviews to persuade, enlighten, and motivate others.

Jessica Hilt, University of California, San Diego

Jessica Hilt is a geek and a writer. She is the technical outreach strategist for the University of California, San Diego. Prior to UCSD, she worked in data and politics for CompleteCampaigns.com and Aristotle, Inc. For the nonprofit So Say We All, she teaches and coaches storytelling for the stage. Her work has appeared in Bourbon Penn magazine, on stage at The Old Globe, and in various bars around town—with or without provocation.

Mini Tutorial I

Commonwealth Ballroom

Git Crash Course

Wednesday, 11:00 am–12:30 pm PST

Thomas Uphill, Wells Fargo

Knowing how to use git has become a requirement. In this crash course we will cover how git is different from other version control systems and why it is more powerful. To understand better how git can do amazing things, we'll pry open the black box and look inside at hashes, branches, HEAD and remotes. Next we'll move on to explore branching, merging and workflows. Finally we'll look at a few tools to help you use git.

Who should attend:
Any users of git from novice to advanced. If you have code that you are modifying, you need to use git.

Take back to work:

How git works
What does 4b825dc642cb6eb9a060e54bf8d69288fbee4904 mean?
Why git can save you
How to avoid the rip and replace mentality

Topics include:

git internals
local, remotes and git-hooks
branching and merging
stashing and reverting
history and blaming
extra tools

Thomas Uphill, Wells Fargo

Thomas currently works at Wells Fargo as a Puppet Engineer, where he uses git every day. He's been using git for a long time and decided to get past the 'turn it off and back on' mentality with git. He's given a few talks about git at conferences and meetups. He's made a few hook scripts here and there as well. He also knows what 4b825dc642cb6eb9a060e54bf8d69288fbee4904 means. When he isn't working, he volunteers for LOPSA (lopsa.org) and SASAG (sasag.org) or goes riding at the local MTB park. You can find him as @uphillian or on ramblings.narrabilis.com.

Mini Tutorial II

Back Bay Ballroom C

How to Virtualize the End-User's Desktop (or How to Play More Warcraft/Halo during the Day)

Wednesday, 11:00 am–12:30 pm PST

Linus Bourque

The desktop has evolved over the years since its first introduction in 1975. And over the years, the management of the desktop and meeting end-user expectations was a challenge for many IT administrators. In this full day class we will look at the evolution of the desktop and where the future of it will be (will it even be a desktop?). We will discuss the concept of a use case and what it involves and then, we will look at one method of managing desktops (from small to large scale) through the use of Horizon 7, AppVolumes, and User Environment Manager based on use cases.

We will discuss the need to decouple the user from the desktop and that IT management will change from desktop to user identity. And managing that can be far easier than what traditional desktop management includes. Finally, we'll look at how mobility and the importance of using an MDM (Mobile Device Management) rounds out desktop management.

Who should attend:
Sysadmins, security admins, network admins, and desktop/Windows admins.

Take back to work:

Understand the future of desktop
Understand how to decouple the user from the desktop
Learn how Horizon is used to achieve this
Learn how to provision a pool of desktops
Learn different methods of application provisioning (no one size fits all!)
Determine what use cases are and how to apply them in organizational environments
Understand the difference between broker and broker-less desktop management

Topics include:

History of the desktop
How to install and configure a Horizon broker
How to create AppStacks using App Volumes
How to separate the user identity from the Windows environment
Pool provisioning using full clones, linked clones and instant clones
Understand the difference between stateful and stateless desktop assignment and why stateless is preferred

Linus Bourque, VMware

Linus Bourque is currently a Principal Instructor at VMware and has been part of the VMware family for nearly 11 years. As part of the Americas Education Tech Lead team, he has been the lead instructor for a variety of End-User Computing courses over the last five years (covering topics such as Horizon with View; Mirage; ThinApp; AppVolumes; User Environment Manager). More recently, he is the co-lead of the SDDC-Security. In addition, he has also been involved in the creation of various Desktop and Mobility certifications including the VCP, VCAP/VCIX, and VCDX. Linus also wrote the 1st Edition of the VMware Press Official VCP-DT Study Guide.

Linus currently lives in Los Angeles with his wife; their pug, Lafawda; and their two cats, Georgie and Gupta. When not teaching, he continues to explore Southern California with Lafawnda in tow in search of a good cigar.

LISA Lab

Back Bay Ballroom D

Beginning Wireshark

Wednesday, 11:00 am–12:30 pm PST

Brett Thorson, Senior Sales Engineer Mid-Atlantic, Dtex Systems

There is an entire world of packets, captures, protocols and errors just beyond your NIC that for most people remain unexplored. Have you ever wondered what all that data on the wire (or in the air) looks like? Then this tutorial is for you. We'll start by installing wireshark and capturing some packets. Soon we'll move onto targeting captures, looking for things that make you go Hmmmmm, and then trying to figure out the Hmmms and whys behind it all. When you are all done you'll be able to go back to your office and be able to prove things like "I'm sending out a DHCP request, but your server isn't answering," "I'm sending packets, but you don't seem to be receiving them," and "It seems like my network card light is on A LOT, what exactly is it doing?"

Who should attend:

Anyone interested in learning how to use Wireshark, the free network packet sniffer and protocol analyzer.

Take back to work:

Basic wireshark

Topics include:

Capturing Packets - Technically Ethernet Frames inside of Ethernet Packets (10 min)
Span port
Tap
Listening on the wire
Yourself
Broadcast packets
Promiscuous mode
Getting & installing wireshark (we might have remote machines to use instead of each person installing it themselves) (10 min)
The startup screen of wireshark (5 min)
List of network adapters
Capture Filter (brief & basic, to be covered in Advanced Section)
Capturing Packets GO! (5 min)
The 3 Windows of Wireshark
Packet List (15 min)
Resolve Physical - OUI
Resolve Network - DNS
Resolve Transport - Port
Back and forth
Time display
Packet Details (5 min)
Drilling down
Packet Bytes (0 min)
Almost never used by humans unless you are searching for text.
Big endian gotchas
Display filters (20 min)
Some basics
Building from expressions
Not the same as Capture Filters
Apply as filter
Prepare a filter
Conversation filter
Follow (Stream) (10 min)

Brett Thorson, Senior Sales Engineer Mid-Atlantic, Dtex Systems

Brett comes to LISA by way of LISA Build. Brett has been involved in the networking for several conferences such as Shmoocon, IETF and Network World + Interop iLabs. He's a huge hobbyist with an almost unending list of interests. Brett enjoys using Wireshark and other security tools to snoop into the places where errors and bugs hide. Brett is currently the Senior Sales Engineer for Dtex Systems for the Mid-Atlantic.

12:00 pm–7:00 pm

LISA Expo

Grand Ballroom Complex

2:00 pm–3:30 pm

Talks I

Constitution Ballroom A

How Should We Ops?

Wednesday, 2:00 pm–2:45 pm PST

Courtney Eckhardt, Heroku

Available Media

"Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations” --Melvin Conway, in 1968

“How should we handle operations?” is one of the major issues in our industry right now. We’ve mostly agreed that consigning people to ops roles with no chance to develop more skills is bad, but the range of responses to this is wide and confusing. The proliferation of terms like DevOps, NoOps, and SRE are frustrating when we try to tell a potential employer what we can do or even when we just try to talk shop together. What we’ve done before isn’t working well, but what can we do instead, and how do we even talk about it?

Heroku uses a Total Ownership model to address operational work. I’ll talk about what this means in practice (with examples), the benefits (clear relevance of your work, good feedback cycles, abolishing class hierarchies), the failure modes (pager fatigue, decentralization), and how we can make all jobs in our engineering organizations more humane and rewarding.

Courtney Eckhardt, Heroku

Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don Norman into a wide-reaching and humane concept of operational reliability.

SRE at a StartUp: Lessons from LinkedIn

Wednesday, 2:45 pm–3:30 pm PST

Craig Sebenik, Matterport

Available Media

SRE works at Google. It works at many large companies. But, can it work at a small startup? Over the past two years, Matterport has continued to embrace SRE. (Even though we don't use that exact term). Many of the strategies I have used to move SRE forward I learned while at LinkedIn.

In this talk, I will give a quick overview of how SRE evolved at LinkedIn and then how I applied those lessons to a startup. Some worked well, others not as much. But, after just a couple of years, we are in a much better position to grow and offer a stable and reliable platform to our customers.

Yes, SRE works at Google. But, it can work for small companies as well.

Craig Sebenik, Matterport

Craig is the Lead Infrastructure Engineer (aka SRE) at Matterport. He joined when Matterport was a couple dozen employees. Now, it is a couple hundred. Prior to Matterport, he was a Staff SRE at LinkedIn where he worked on the SRE infrastructure team (metrics, configuration management using salt, etc.). Going further back, he has worked at startups that have failed, companies that have done well, done chemistry research and even attended cooking for school for a couple of years.

Talks II

Constitution Ballroom B

Capturing All of Stack Overflow's Logs

Wednesday, 2:00 pm–2:45 pm PST

George Beech, Stack Overflow

Available Media

This talk will cover the methods that Stack Overflow uses to capture, query, and explore our logs. We use two systems to store and query logs. First, we have a Custom processor that inserts every web log into a SQL database. Second, we send all system logs to a large ELK cluster for exploration, graphing, and ad-hoc reports.

I will cover how we handle the large amount of logs we ingest. The SQL technologies we have used as well as how our 300TB ELK cluster is configured.

I will also go over the pain points that we have experienced while managing and developing our logging system.

George Beech, Stack Overflow

George is a Site Reliability Engineer for Stack Overflow, a top 50 website and a highly regarded resource for developers. With 15 years of experience in the IT industry, George has designed, implemented, and managed systems of all sizes—from small, two-computer operations to Enterprise systems with thousands of machines. As a generalist, he enjoys working on all aspects of the datacenter—Physical, Network, Server, and OS.

George is passionate about the IT profession, he enjoys sharing his experience and work with others in the field, and speaks at various conferences and meetups throughout the year. He serves as the President of LOPSA—the League of Professional System Administrators. In addition to his love of all things technology, he is an avid sports fan and video gamer.

The Hard Truths about Microservices and Software Delivery

Wednesday, 2:45 pm–3:30 pm PST

Anders Wallgren, Chief Technology Officer, Electric Cloud and Avantika Mathur, Product Manager, ElectricFlow

Available Media

Everybody’s talking about Microservices right now. But are you having trouble figuring out what it means for you?

As software organizations continue to invest in achieving faster release cycles and Continuous Delivery (CD) of their applications, we see increased interest in microservices architectures, which—on the face of it—seems like a natural fit for enabling CD.

With Microservices, what was once one application, with self-contained processes, is now a complex set of independent services that connect via the network. Each microservice is developed and deployed independently, often using different languages, technology stacks, and tools. While Microservices support agility—particularly on the development side—they come with many technical challenges that greatly impact your software delivery pipelines, as well as other operations downstream.

Are you considering Microservices?
Do they make sense for your particular use case?
What are some of the “gotchas” you should be aware of?
Are you looking for best practices on how to get started with microservices?
Are you looking for tips for designing your delivery pipeline(s) for microservice-driven apps?
How will your existing practices need to change to take advantage of microservices?

This session outlines some of the hard truths and challenges with microservices—among them the impact of the mono/micro hybrid state; increased pipeline variations; enforcing governance and compliance standards; complexities of integration testing and monitoring and operations across the growing heterogeneous environments; and difficulties around system-level visibility and management.

Next we will outline concrete best practices on how to address these challenges as you get started with implementing microservices and designing your pipeline and processes to support microservices-driven applications.

Anders Wallgren, Electric Cloud

Anders Wallgren is Chief Technology Officer at Electric Cloud. Anders has over 25 years’ experience designing and building commercial software. Prior to joining Electric Cloud, he held executive positions at Aceva, Archistra, and Impresse, and management positions at Macromedia (MACR), Common Ground Software, and Verity (VRTY), where he played critical technical leadership roles in delivering award winning technologies such as Macromedia’s Director 7. Anders holds a BSc. from MIT. He has previously presented at Agile 2016, Velocity Amsterdam, DevOps Enterprise Summit 2014 and 2015, and PuppetConf, to name a few.

Avantika Mathur, ElectricFlow

Avan (Avantika) Mathur is the Product Manager for ElectricFlow. In her previous role, Avan was the Global Technical Account Manager at Electric Cloud, helping large enterprises across Finserv, Retail and Embedded accelerate their DevOps adoption. Avan has worked with customers to design complex automation solutions and optimize their delivery pipeline, to speed up software-driven innovation and increase Agile throughput. Avan holds a degree in Computer science. Prior to Electric Cloud, she worked as a software engineer at IBM for five years as a Linux kernel developer.

Talks III

Back Bay Ballroom AB

Traps and Cookies: A Mystery Package from Your Former Self

Wednesday, 2:00 pm–2:45 pm PST

Tanya Reilly, Google

Available Media

Does your production environment expect perfect humans? Does technical debt turn your small changes into minefields? This talk highlights tools, code, configuration, and documentation that set us up for disaster. It discusses commons traps that we can disarm and remove, instead of spending precious brain cycles avoiding them. And it offers practical advice for sending your future self (and future coworkers!) little gifts, instead of post-mortems that just say “human error :-(”.

Includes stories of preventable outages. Bring your schadenfreude.

Tanya Reilly, Google

Tanya Reilly has been a Systems Administrator and Site Reliability Engineer at Google since 2005, working on low level infrastructure like distributed locking, load balancing, bootstrapping and monitoring systems. Before Google, she was a Systems Administrator at eircom.net, Ireland's largest ISP, and before that she was the entire IT Department for a small software house.

How to Succeed in Ops without Really Trying

Wednesday, 2:45 pm–3:30 pm PST

Chastity Blackwell, Yelp

Available Media

The last few years have been a time of immense change in the field of operations; not only are there new technologies popping up every day, but the entire way of doing operations seems to be changing. “System administration” jobs are becoming “devops engineers,” “site reliability engineers,” “production engineers,” or something else that no one seems exactly sure how to define. The proliferation of cloud services is making it easier for some companies to avoid having any purely operations organization at all (or at least think they can). In this kind of climate, how does anyone, especially people without much experience, or who feel like they are years behind the curve, keep up with the pace of change? How do you make sure your skills are still keeping you viable in the job market? And how do you do all this without feeling like you're going to have to spend every waking moment keeping up? This talk will discuss some of the secrets to be a successful operations engineer without sacrificing everything else.

Chastity Blackwell, Yelp

Chastity Blackwell took her first job as a system administrator in 1999 just to pay the bills until she could get a writing job. After 12 years working in infrastructure operations at the University of Illinois, she decided this might actually be a career, and was lured out to the Bay Area to work for a startup. She survived a yearlong stint as a manager before recently returning to the front lines as a Site Reliability Engineer at Yelp.

Mini Tutorial I

Commonwealth Ballroom

S, M, and L Logstash Architectures: The Foundations

Wednesday, 2:00 pm–3:30 pm PST

Jamie Riedesel, HelloSign

LogStash can scale. From all-in-one boxes (S) to architectures that involve routing log-lines to separate parsing clusters (L), LogStash can do it. In this talk I will be going over the foundations of LogStash architectures; such as the components of LogStash, where it can be deployed, working with ElasticSearch, and an overview of human interfaces to this data.

Who should attend:
People new to logstash, and are looking to get more information about how it works.

Take back to work:
People will come away with an improved understanding about how LogStash works under the hood, ways it can be used, and some idea about how to display all of that information.

Topics include:
LogStash, Kibana, Grafana, Syslog, and ElasticSearch

Jamie Riedesel, HelloSign

Jamie Riedesel is a DevOps Engineer at HelloSign and has been performing acts of systems administration and engineering since 1997, and more dev-like things since 2010. She moved from corporate IT to the startup space in 2010 and experienced the good kind of culture shock. Jamie has been blogging as sysadmin1138 since 2004, a community elected moderator on ServerFault since 2010, and awarded the Chuck Yerkes community award by LOPSA in 2015.

Mini Tutorial II

Back Bay Ballroom C

Quick Start to Pacemaker HA

Wednesday, 2:00 pm–3:30 pm PST

Mike Diehn

Want to get started with a Pacemaker HA cluster? In Linux? Here you go.

I'll guide you through the debris field of Google hits on the topic and teach you what pieces you need, how they fit together and where not to step to avoid falling into the quicksand.

Who should attend:
People who are sick of having heavily used app servers offline while they rush around getting them or their back-end systems working again.

Take back to work:

Where to find documentation on Pacemaker.
An example architecture for an haproxy, tomcat, postgresql high-availability cluster they can expand or modify.
Files of example commands used to create that cluster.
The confidence that Pacemaker can help them!

Topics include:

Pacemaker HA
Tomcat installation and config basics
Postgresql basics, including simple installation and config for replication
HAProxy basics

Mike Diehn

Mr. Diehn works for software development companies building and running the various systems that the devs use to do their work. Most of his time these days he spends on CentOS Linux systems running Atlassian systems. He also serves as the resident regexp and shell guru and helps with the dishes from time to time (DNS, DHCP, blah, blah). If you want to talk about poultry, fishing, male children, dogs, science fiction, the US Navy, brilliant wives, living in New Hampshire or other such stuff, well, he can talk about that, too.

LISA Lab

Back Bay Ballroom D

Core Skills: Scripting for Automation

Wednesday, 2:00 pm–5:30 pm PST

Mike Ciavarella, Coffee Bean Software Pty Ltd

Automation is critical to surviving your system administration career with your sanity, hair, and systems intact. If you can automate some or all of a task, then you stand to make considerable gains in personal productivity, task repeatability, and system predictability.

This class is a practical crash course in how—using a combination of bash, Perl, and friends—you can write useful scripts that solve real-world system administration problems.

Please note that this is a hands-on class. A basic understanding of programming ("What's a loop?") and how to edit files in your favorite flavor of •nix are assumed. Attendees will need to bring a laptop with OS X, Linux, or FreeBSD installed to complete in-class tasks. Time in the LISA lab will also be scheduled to complement this class.

Who should attend:
Junior and intermediate sysadmins who are new to scripting or would like to create scripts to reliably automate sysadmin tasks.

Take back to work:
Understanding of common scripting patterns and techniques

Topics include:
An understanding of how to apply standard utilities in your scripts, along with recipes for automating typical administration tasks.

3:30 pm–4:00 pm

Break with Refreshments

Grand Ballroom Foyer

4:00 pm–5:30 pm

Talks I

Constitution Ballroom A

Behind Closed Doors: Managing Passwords in a Dangerous World

Wednesday, 4:00 pm–4:45 pm PST

Noah Kantrowitz

Available Media

Secrets come in many forms, passwords, keys, tokens. All crucial for the operation of an application, but each dangerous in its own way. In the past, many of us have pasted those secrets in to a text file and moved on, but in a world of config automation and ephemeral micro-services these patterns are leaving our data at greater risk than ever before.

New tools, products, and libraries are being released all the time to try to cope with this massive rise in threats, both new and old—but-ignored. This talk will cover the major types of secrets in a normal web application, how to model their security properties, what tools are best for each situation, and how to use them with major web frameworks.

Noah Kantrowitz

Noah Kantrowitz is a web developer turned infrastructure automation enthusiast, and all around engineering rabble-rouser. By day he builds tools and teaches, and by night he works with the Python Software Foundation infrastructure team. He is an active member of the Chef community, and enjoys merge commits, cat pictures, and beards.

The Road to Mordor: Information Security Issues and Your Open Source Project

Wednesday, 4:45 pm–5:30 pm PST

Amye Scavarda, Red Hat

Available Media

From time to time, communities will run across information security incidents. In the course of project expansion, it always seems like a good idea to wake up a new instance of Something_With_A_Database and not write down the credentials or think very clearly about what the permissions are on that new instance. If you're involved in open source for any length of time, you're going to discover a hack at some point in time. However, the Lord of the Rings is a great model for being able to deal with your information security issues.

I'll cover:

The forging of the ring: or how this stuff happens in the first place
How Gollum became corrupted: what happens when you don't work in a timely manner to resolve these things
The cast of characters: someone on your team is going to be Gandalf. You might not always have a ranger who comes out of the shadows and saves you
The journey to Rivendell: what effective discovery on an information security looks like
The council of Elrond: what to do after you've gone through discovery and now you need input
The mines of Moria: what happens when you don't do a thorough discovery, and/or information comes to light that should not have been forgotten
Getting waylaid on the road: challenges within the team and balancing out different needs around disclosure and resolution
Good grief, Boromir: Someone who has different ideas even after the Council of Elrond
Actually getting the ring to Mordor: Resolution/launch, disclosure
Going back and cleaning up the shire: Making sure you're in a better place at the end

Talks II

Constitution Ballroom B

An Architect's Guide to Designing Risk

Wednesday, 4:00 pm–4:45 pm PST

Daemon Behr, Industry professional

Available Media

This talk is meant to expand the view of the Architect to take in considerations that are not always in plain sight. By looking at problems from the perspective of other business units, analyzing cause and effect with regression models, and using psychological techniques, the Architect can create solutions that are more robust, adaptable, and cost effective.

Daemon Behr, Industry professional

Daemon Behr is a Systems Engineer at Scalar Decisions. He has over 20 years of industry experience and is a virtualization and converged infrastructure subject matter expert. He has taught courses on infrastructure design and security at BCIT, UBC, and has presented at the Openstack Summit, VMWorld, and at various other events across Canada and the US.

Engineer the Future

Wednesday, 4:45 pm–5:30 pm PST

Jinnah Hosein, SpaceX

Available Media

This presentation will be a discussion of software engineering, CI/CD, and development workflow at SpaceX.

Talks III

Back Bay Ballroom AB

Don't Burn Out or Fade Away: Conquering Occupational Burnout

Wednesday, 4:00 pm–4:45 pm PST

Avleen Vig, Etsy

Available Media

Occupational burnout is felt by many people at some point in their career. We'll discuss what burnout is, the causes, symptoms, and impacts of burnout, as well as ways to recover from it. This talk brings together scientific research, with a personal story of burning out and returning to health over the course of 12 months.

Occupational burnout is a concern for employees and employers. It is common across all industries and companies regardless of the number of employees, revenue, types of projects or hours worked.

Burnout is also deceptive, often the person experiencing it doesn’t even realize they are in the middle of an episode. At best, a burnt out employee can work their way through it. At worst, they break friendships, burn bridges, and leave what would otherwise be happy employment with a trail of sadness in their wake.

During this talk, we’ll take a walk through a recent, long-lasting episode of burnout, discussing each point and how it relates to the research on the topic, and what the current research teaches us. We’ll especially look as methods to identify burnout in yourself and others, and how to break the cycle before it’s too late.

Avleen Vig, Etsy, Inc.

Avleen is a Staff Operations Engineer at Etsy, where he spends much of his time growing the infrastructure for selling knitted gloves and cross-stitch periodic tables. Before joining Etsy he worked at several large tech companies, including EarthLink and Google, as well as a number of small successful startups.

You Can't Build a Team in the Thunderdome: Better Hiring through Empathy

Wednesday, 4:45 pm–5:30 pm PST

Ryan McKern, Puppet

Available Media

A strong team is more than a loosely affiliated assemblage of individuals or an echo chamber of like-minded people who speak as one multi-headed hive mind. Hiring new people for your strong team is probably one of the most challenging tasks your team will have to do. All too often strong technical teams use some variation of the "Standard Technical Interview." This self-propagating interview "process" seems to be designed to both wear out the team giving the interview and emotionally flatline any candidates subjected to it.

I believe that hiring is one of the most important contributions you will make to your organization. Hiring well should be about more than just getting Unicorn candidates to sign on the dotted line. After years of technical interviews with different types of organization, I’ve realized that most technical interviews suffer from focusing too deeply on problems the team had yesterday instead of the team they want to be tomorrow.

I want to talk about what happened when my team tried treating candidates like peers who already had the job instead of giving in to Repetition Compulsion and inflicting trial-by-combat on them just because that's how we were hired. Interviewers felt like they had a better grasp on their role in the process, candidates did not feel like there was some secret handshake or passphrase they were missing, and the company didn’t collapse even though nobody wrote pretend-code on a whiteboard!

Ryan McKern, Puppet

Ryan McKern has been a dishwasher, a bakery clerk, a telemarketer, and a Taco Bell employee. As a member of the Release Engineering team at Puppet, he now uses those skills to teach others to replace soda syrup and CO2 in the company kitchen. Having abandoned Boston winters for Portland’s apartment shortage, he’s excited to return to Boston and share the techniques he’s used to survive the horrors of countless "Standard Technical Interviews." He is a friend to all cats, an avid soda connoisseur, and hard of hearing from too many concerts without earplugs.

Mini Tutorial I

Commonwealth Ballroom

S, M, and L Logstash Architectures: Reaching For the Sky

Wednesday, 4:00 pm–5:30 pm PST

Jamie Riedesel, HelloSign

LogStash can scale. From all-in-one boxes (S) to architectures that involve routing log-lines to separate parsing clusters (L), LogStash can do it. If you have the foundations of LogStash down, we can talk about scaling it up. From architectures with syslog as the collector and LogStash purely as a parser, to architectures where LogStash is acting as both collector and parser, you will run into scaling issues as you get bigger. We will go over scaled up and out architectures, and equip you with the knowledge of what XL might look like for you.

Who should attend:
People with experience with LogStash, looking to find out how to make it scale, and do more.

Take back to work:
People will come away with knowledge of various scaled out LogStash architectures, and how to set up ElasticSearch.

Topics include:
LogStash, ElasticSearch, Syslog, TSDB, Collectd, and AWS

Jamie Riedesel, HelloSign

Jamie Riedesel is a DevOps Engineer at HelloSign and has been performing acts of systems administration and engineering since 1997, and more dev-like things since 2010. She moved from corporate IT to the startup space in 2010 and experienced the good kind of culture shock. Jamie has been blogging as sysadmin1138 since 2004, a community elected moderator on ServerFault since 2010, and awarded the Chuck Yerkes community award by LOPSA in 2015.

Mini Tutorial II

Back Bay Ballroom C

Release Pipelines in Microsoft Ecosystems

Wednesday, 4:00 pm–5:30 pm PST

Warren Frame, Harvard University, and Michael Green, Microsoft

Available Media

A 90 minute walk through exposing CI/CD release pipelines in Microsoft-oriented ecosystems.

The session will cover:

Why the pipeline is important, even in a shop that isn't practicing DevOps or rolling infrastructure-as-code.
Each component in the pipeline: source control, build, test, and release, along with some of the benefits, goals, and common tools for each.
Practical, hands on examples of each component in the pipeline, on a hosted service (e.g. GitHub / AppVeyor). With a backup plan in case things like the network or GitHub aren't happy.
At the end, we'll demo the pipeline as a whole, from managing simple config / script files, to defining, spinning up, and testing infrastructure changes.
We'll use common tools in this session, but will highlight that these can be swapped out with a variety of alternatives. We'll use GitHub (source), psake (build), Pester (test), PSDeploy (release), Test-Kitchen (infrastructure test harness) and AppVeyor (build system), mentioning that the audience could swap out AppVeyor for Jenkins or TeamCity, for example, or perhaps run the entire pipeline in GitLab CE, with GitLab CI running the built/test/release phases.

Who should attend:
Attendees interested in:

Improving their workflow in a Microsoft ecosystem, whether they work with scripts and config files, infrastructure-as-code, or business critical software
Improving their security posture, reducing recovery times, simplifying change management, and making life easier
A hands on, practical demonstration of a release pipeline that could be used for open source projects, PowerShell, DSC, and other code or configurations you write
Hands on experience with source control (GitHub), build (psake), testing (Pester), release (PSDeploy), and a build system to tie this all together (AppVeyor)

Take back to work:

Practical experience with common tools used in Microsoft-oriented release pipelines: Git, GitHub, Pester, and AppVeyor
Knowledge of the release pipeline concepts that you could use to design your own pipeline, regardless of the underlying tools
References and working examples to borrow and tweak

Topics include:

Release pipelines
Source control
Testing
Continuous Integration
Continuous Delivery and Deployment
PowerShell

Warren Frame, Harvard University

Warren Frame is an Infrastructure Engineer in Research Computing at Harvard University, Faculty of Arts and Sciences. He spends his days finding ways to minimize tedious and error prone work, writing incoherent commit messages, and occasionally, doing his job. He enjoys learning and sharing what he learns, often paired up with some poorly written PowerShell.

Michael Greene, Microsoft

Michael Greene is a Principal Program Manager at Microsoft in the Enterprise Cloud Group division. Michael is the PowerShell and DevOps lead for the CAT team (Customers, Architecture, and Technology). He drives customer feedback in the areas of management and automation, creates content to improve the customer experience, and provides a connection to engineering for projects that are adopting new products and technologies. Previously he worked in Office 365 operations where he gained experience using PowerShell to manage environments at cloud scale.

LISA Lab (continued)

Back Bay Ballroom D

Core Skills: Scripting for Automation

Wednesday, 2:00 pm–5:30 pm PST

Mike Ciavarella, Coffee Bean Software Pty Ltd

Automation is critical to surviving your system administration career with your sanity, hair, and systems intact. If you can automate some or all of a task, then you stand to make considerable gains in personal productivity, task repeatability, and system predictability.

This class is a practical crash course in how—using a combination of bash, Perl, and friends—you can write useful scripts that solve real-world system administration problems.

Please note that this is a hands-on class. A basic understanding of programming ("What's a loop?") and how to edit files in your favorite flavor of •nix are assumed. Attendees will need to bring a laptop with OS X, Linux, or FreeBSD installed to complete in-class tasks. Time in the LISA lab will also be scheduled to complement this class.

Who should attend:
Junior and intermediate sysadmins who are new to scripting or would like to create scripts to reliably automate sysadmin tasks.

Take back to work:
Understanding of common scripting patterns and techniques

Topics include:
An understanding of how to apply standard utilities in your scripts, along with recipes for automating typical administration tasks.

5:30 pm–7:00 pm

Expo Happy Hour

Sponsored by Apple
Grand Ballroom Complex

7:00 pm–11:00 pm

Birds-of-Feather Sessions

View the full schedule of BoFs on the LISA16 BoFs page.

Thursday, December 8, 2016

7:30 am–5:00 pm

On-Site Registration and Badge Pickup

Grand Ballroom Foyer

7:30 am-9:00 am

Continental Breakfast

Grand Ballroom Foyer

9:00 am–10:30 am

Keynote Address

Constitution Ballroom

The Future of Engineering Tools and Techniques in Operations

Thursday, 9:00 am–10:30 am PST

Mitchell Hashimoto, HashiCorp

Available Media

We're currently undergoing major changes across development and operations that are pushing the boundaries of our comfort zone. While the change keeps coming, trends and practices have been emerging that show promise as the way we can tame this complexity. In this talk, I'll present the changes we're seeing, why we're seeing them, the ideas being introduced to manage this change, and the glorious future we're all heading towards.

Mitchell Hashimoto, HashiCorp

Mitchell Hashimoto is the founder of HashiCorp and creator of popular DevOps tools such as Vagrant, Packer, Terraform, Vault, and more. Mitchell is an O'Reilly author and is also one of the top GitHub users by followers, activity, and contributions. Mitchell was part of Inc's 30 under 30. "Automation obsessed," Mitchell solves problems with as much computer automation as possible.

10:00 am–2:00 pm

LISA Expo

Grand Ballroom Complex

10:30 am–11:00 am

Break with Refreshments

Grand Ballroom Foyer

11:00 am–12:30 pm

Talks I

Constitution Ballroom A

Passing the Console: Fostering the Next Generation of Ops Professionals

Thursday, 11:00 am–11:45 am PST

Alice Goldfuss, New Relic

Available Media

Where does one go to learn ops? There are no sysadmin degrees or operations bootcamps. We learn our trade like dark witches, through varied and arcane means: books, talks, hands-on experiences, real-life outages, and backchannel lore. But tech is a thriving industry, ripe for eager new ops professionals and ideas. You have very likely met the person or people who will follow in your footsteps. So, how do we pass on our craft?

This talk will explore what it means to be in operations and how to pass on that unique skill set. We will discuss what tools and culture define our field, who to pass this knowledge onto, and how to do this effectively. Believe it or not, you are capable of contributing to the greater ops legacy in a variety of ways.

Come learn how to pass the torch—or console—and prepare the next generation of ops.

Alice Goldfuss, New Relic

Alice Goldfuss, codename kdumpster, is a Site Reliability Engineer at New Relic. She’s consulted on some books (Docker: Up & Running, Effective DevOps), presented at some conferences (LISA, SREcon, Velocity), and ran another one (DevOps Days Portland). One time she made Elasticsearch green; it was the happiest two minutes of her life. You can follow her on Twitter (@alicegoldfuss), but you’ll probably regret it.

Heresy in the Church of Docker

Thursday, 11:45 am–12:30 pm PST

Corey Quinn, The Quinn Advisory Group

Available Media

Docker (and by extension, microservices-based architecture) has expanded our horizons with respect to how the industry builds and supports applications at scale, which helps to explain why so many people seem willing to throw away decades of experience in favor of untested tools and barely functional design principles.

In this entertaining and somewhat irreverent talk, Corey presents the "other side" of the containerization craze: how configuration management fits into a world consumed by the DockerDockerDocker madness, how "I'll run this container in production" can blow up in your face when you least expect it, and how promising technologies should perhaps be vetted a bit more thoroughly before you throw away decades of hard-won experience supporting traditional architectures.

Corey Quinn, The Quinn Advisory Group

Principal at The Quinn Advisory Group, Corey has a history as an engineering manager, public speaker, and advocate for cloud strategies which speak to company culture. He specializes in helping companies control and optimize their AWS cloud footprint without disrupting the engineers using it.

Outside of his professional work, Corey is known for overdressing, telling entertaining stories, and carrying a cigarette case full of drink umbrellas.

Talks II

Constitution Ballroom B

TTL of a Penetration

Thursday, 11:00 am–11:45 am PST

Branson Matheson, Cisco Systems, Inc.

Available Media

In the world of information security, it's not a matter of how anymore...it’s a matter of when. With the advent of penetration tools such as Metaspolit, AutoPwn, etc.—plus the day-to-day use of insecure operating systems, applications, and Web sites—reactive systems have become more important than proactive systems. Discovery of penetration by out-of-band processes and being able to determine the when and how to then mitigate the particular attack has become a stronger requirement than active defense. I will discuss the basic precepts of this idea and expand with various types of tools that help resolve the issue. Attendees should be able to walk away from this discussion and apply the knowledge immediately within their environment.

Branson Matheson, Cisco Systems, Inc.

Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

What I Learned from Science-ing Four Years of DevOps

Thursday, 11:45 am–12:30 pm PST

Nicole Forsgren, DORA

Available Media

Four years, over 20,000 DevOps professionals, and some science... What did we find? Well, the headline is that IT does matter if you do it right. In this fun interactive session, Nicole will discuss ways to make your data better, some surprises the team has seen over the years, and then some highlights from the research: With a mix of technology, processes, and a great culture, IT contributes to organizations' profitability, productivity, and market share. We also found that using continuous delivery and lean management practices not only makes IT better—giving you throughput and stability without tradeoffs—but it also makes your work feel better—making your organizational culture better and decreasing burnout.

Nicole Forsgren, DORA

Dr. Nicole Forsgren is an IT impacts expert who shows leaders and practitioners how to unlock the potential of technology change in their organizations. Best known for her work with tech professionals and as the lead investigator on the State of DevOps Reports, she is CEO and Chief Scientist at DORA (DevOps Research and Assessment) and an Academic Partner at Clemson University. In a previous life, she was a professor, sysadmin, and hardware performance analyst.

Talks III

Back Bay Ballroom AB

Data Has Always Been Big

Thursday, 11:00 am–11:45 am PST

Kyle Erf, Software Engineer, MongoDB, Inc.

Available Media

In the tech sector, we pride ourselves in innovating. But if beating the past is our goal, then why do most of us only have a view of the past that extends back a few decades? Seemingly every day, another article is published explaining Big Data, our generation’s new struggle to manage more information than it can readily process. Big Data, however, is nothing new—most societies in human history struggled with this very same problem, from the Fertile Crescent to the Industrial Revolution. My talk will present a brief overview of how and why the amount of available information has always outpaced our ability to fully comprehend it.

This talk will give a brief history of handling information and how humanity’s solutions for dealing with information always ends with the creation of even more information, leading to the perpetual feeling of having "too much data to deal with" that brings us to our current Big Data situation. I will include historical points such as

the birth of the written word and the death of memorizing everything (and how mad people were about it)
early means of "backing up" the world’s writing
how Catholic monks invented alphabetical ordering
the explosion of knowledge due to Gutenberg’s printing press
the wacky tools researchers of the Renaissance used to organize the information overload of their day
early census machines and electronic databases

in order to place our current issues with information overload into this much larger timeline.

As a programmer at one of the leading "Big Data" companies, I'm sometimes uncomfortable with how the marketing speak and conference talks surrounding data storage always introduce Big Data as a new problem. While the technical specifics certainly are new, this feeling of information overload has existed since the dawn of written information. With this historical context in mind, we can better build for the future and avoid recreating the mistakes of those who came before.

SRE: It's People All the Way Down

Thursday, 11:45 am–12:30 pm PST

Courtney Eckhardt and Lex Neva, Heroku

Available Media

Is the root cause really “human error”? How did your environment let the human make the error? How did their error take down the service? How many outages did humans prevent? Can your dev teams’ priorities be aligned with reliability, instead of only with churning out features?

At Heroku, we do ops as a service—reliability is our product. If we go down, we take thousands of businesses with us. In SRE, we push for reliability and resiliency in designs, sure, but it’s more than that. We iterate on process, automation, tooling, and incident response, because people are at the heart of everything we do.

Courtney Eckhardt, Heroku

Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don Norman into a wide-reaching and humane concept of operational reliability.

Lex Neva, Heroku

Lex Neva is probably not a super-villain. He has six years of experience keeping large services running, including Linden Lab's Second Life, DeviantArt.com, and his current position as a Heroku SRE. While originally trained in computer science, he’s found that he most enjoys applying his software engineering skills to operations. A veteran of many large incidents, he has strong opinions on incident response, on-call sustainability, and reliable infrastructure design, and he currently runs SRE Weekly (sreweekly.com).

Mini Tutorial I

Commonwealth Ballroom

Interfacing with Humans: How to Manage in Prod Ops

Thursday, 11:00 am–12:30 pm PST

Connie-Lynne Villani, Grilled Cheese Invitational

Whether you call it Prod Ops, System Engineering, or simply "keeping it all working," ops managers face some particular challenges. How do you build new projects and services while solving all the production emergencies caused by the old, broken infrastructure? How do you juggle the demands of other teams in the company while keeping the site running? Above all, how do you give your team agency and keep them happy in a high-pressure, distraction-driven, 24x7 environment? You will learn practical skills and techniques for the human side of Ops in this mini-tutorial.

This tutorial breaks down prod ops management into three separate sections: managing teams, managing clients, and managing events, while providing tools and interactive practice for each area.

Who should attend:
The target audience for this talk is primarily managers (product, project, and people), or people looking to move into management. However, individual contributors are an excellent secondary audience for this talk because it provides real-life tools they can bring back to their team, and guidance on being a leader even if you aren't the manager.

Take back to work:
Attendees should bring back to work the following practical techniques for people management:

How to interview for the skills you want while avoiding bias.
How to coach underperformers, even when it's uncomfortable.
How to encourage growth.
How to distribute work equitably and manage time.
Techniques for avoiding burnout (for the manager and the ICs).
Three different techniques for collaboration (Fist to Five, Collaboration Contracts, and Parallel Thinking).
How to set expectations and evaluate resources when taking on new projects for your team.
How to prepare for and anticipate crisis, and keep a team functional during that time.

Topics include:
Topics covered in this mini-tutorial include:

Hiring
Performance management
Product management
Tools for collaboration
Time management
OnCall Rotations
Incident Analysis

Connie-Lynne Villani, Grilled Cheese Invitational

With degrees in both Electrical Engineering and Theater Management, Connie-Lynne brings 20 years of System Engineering experience to the table, as well as a keen understanding of how to handle drama in the workplace. In addition to founding and managing Groupon's first SRE team, Connie-Lynne has worked at Linden Lab, Change.org, and Caltech, but admits that her most fun position is serving as a board member for the Grilled Cheese Invitational, an annual food festival celebrating all things cheesy.

Mini Tutorial II

Back Bay Ballroom C

Writing and Consuming REST Services

Thursday, 11:00 am–12:30 pm PST

Chris St. Pierre, Cisco Systems

REST services are widely used for interaction with and between applications and for systems management tasks. This mini-tutorial offers a quick introduction to how REST services are structured, for both the implementer and the client. We will cover the use of HTTP verbs, the architecture of URIs, maintenance of state, middleware, and more.

Who should attend:
People interested in gaining or solidifying their knowledge of how REST services work and how to interact with them in both manual and programmatic ways.

Take back to work:
Increased confidence in discovering, using, debugging, and writing REST services, for both systems management tasks and application interaction.

Topics include:
HTTP verbs, the architecture of URIs, maintenance of state, middleware, discoverability, error codes, asynchronicity, data submission, versioning, etc.

Chris St. Pierre, Cisco Systems, Inc.

Chris St. Pierre is currently serving the thirteenth year of a life sentence to hard labor at the command line. He works as an OpenStack engineer at Cisco and is a core contributor to Rally, the OpenStack benchmarking tool.

LISA Lab

Back Bay Ballroom D

Advanced Wireshark

Thursday, 11:00 am–12:30 pm PST

Brett Thorson, Senior Sales Engineer Mid-Atlantic, Dtex Systems

You've used Wireshark before to watch packets on the network, and maybe you even wrote filters to get rid of all the noise you don't care about. But what about doing really advanced things like following a stream or listening to VOIP phone calls, pulling images out of captures, detecting duplicate DHCP offers, and finding downright shady stuff on the wire? We'll also show you why running Wireshark as root is a BAD idea. After this, you'll be able to go back with new Wireshark skills that will have you finding irregularities on the network in no time. Before taking this class, I recommend that you have an intimate familiarity with Wireshark. You've run it before, you've tracked down some stuff and wrote some boolean filters. We're going to skip over that stuff and dive into the more advanced features of Wireshark, including non TCP/IP things too!

Who should attend:
System and network administrators with experience with Wireshark, who want to learn more.

Take back to work:

Follow streams
Capture images
Listen to VoIP
Detect duplicate DHCP offers

Topics include:

Capture Filters
What is a BPF, and why should I compile it?
Security issues & breaking the crap out of Wireshark
Statistics
Packet lengths—why you should care
IO Graph—find the noisy talkers
Endpoints—who’s talking to who
Listening to VOIP calls with Wireshark
Searching for clear text
Passwords
And anything else that might be interesting
What does a crappy network look like?
Troubleshooting
Use case #1—Loaded pcap—Tiny MTU
Shenanigans on the Wire (Dual DHCP servers)
Wrong broadcast address/netmask
IPv6 neighbor smacking

Brett Thorson, Senior Sales Engineer Mid-Atlantic, Dtex Systems

Brett comes to LISA by way of LISA Build. Brett has been involved in the networking for several conferences such as Shmoocon, IETF and Network World + Interop iLabs. He's a huge hobbyist with an almost unending list of interests. Brett enjoys using Wireshark and other security tools to snoop into the places where errors and bugs hide. Brett is currently the Senior Sales Engineer for Dtex Systems for the Mid-Atlantic.

Vendor Talks

Republic B

NetOps Meets DevOps: Network Infrastructure Management Automation

Thursday, 11:00 am–11:45 am PST

Marcio Saito, CTO, Opengear

Available Media

This presentation will examine how technologies such as software-defined networking, zero-touch provisioning, orchestration, and NETCONF are transforming network management. Network Engineers are being forced to evolve from the comfort of their vendor-specific CLI knowledge to learn new protocols, programatically access APIs, and automate network provisioning and change and configuration management.

Marcio Saito, Opengear

Marcio Saito is the CTO for Opengear, a company providing Resilience and Out-of-Band Management solutions for the network infrastructure. He was one of the pioneers in the Open Source movement in the early 1990's and has been working with data center technologies in the data center for over 20 years.

The Paradox of Software Craftsmanship

Thursday, 11:45 am–12:30 pm PST

Theo Schlossnagle, Circonus

Craftsmanship in software tends to erode as team sizes increase. This can be due to a large variety of reasons, but is often dependent on code base size, team size, and autonomy. In this session I'll talk about some of the challenges companies face as these things change and how to manipulate teams, architectures and how people work to maintain software craftsmanship will still delivering product.

Theo Schlossnagle, Circonus

Theo founded Circonus in 2010, and continues to be its principal architect. After earning undergraduate and graduate degrees from Johns Hopkins University in computer science, he went on to research resource allocation techniques in distributed systems during four years of post-graduate work. In 1997, Theo founded OmniTI, which has established itself as the go-to source for organizations facing today's most challenging scalability, performance, and security problems. He was also the principal architect of the Momentum MTA, which is now the flagship product of Message Systems, Inc. Born from Theo's vision and technical wisdom, this innovation is transforming the email software spectrum.

A widely respected industry thought leader, Theo is the author of Scalable Internet Architectures (Sams) and a frequent speaker at worldwide IT conferences. Theo is a member of the IEEE and a senior member of the ACM. He serves on the editorial board of the ACM's Queue Magazine.

Theo resides in Maryland with his wife and three daughters. When speaking about his work, he remarks, "I like tackling hard problems and playing with big toys [computing equipment]."

2:00 pm–3:30 pm

Talks I

Constitution Ballroom A

Stealing the Best Ideas from DevOps: A Guide for Sysadmins without Developers

Thursday, 2:00 pm–2:45 pm PST

Thomas Limoncelli, StackOverflow.com, and Christina Hogan, Independent Consultant

Available Media

DevOps is not a set of tools, nor is it just automating deployments. It is a set of principles that benefit anyone trying to improve a complex process. This talk will present the DevOps principles in terms that apply to all system administrators, and use case studies to explore their use in non-developer environments.

Thomas Limoncelli, StackOverflow.com

Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom. He lives in New Jersey.

Christina Hogan, AT&T

Christina Hogan is one of the authors of The Practice of System and Network Administration and The Practice of Cloud System Administration. She has twenty years of experience in system administration and network engineering, from Silicon Valley to Italy and Switzerland. She has gained experience in small startups, mid-sized tech companies, and large global corporations. She worked as a security consultant for many years and her customers included eBay, Silicon Graphics, and SystemExperts. In 2005 she and Tom Limoncelli shared the SAGE Outstanding Achievement Award for the first edition of The Practice of System and Network Administration. She has a bachelor’s degree in mathematics, a master’s degree in computer science, a doctorate in aeronautical engineering, and a diploma in law. She also worked for six years as an aerodynamicist in a Formula 1 racing team and represented Ireland in the 1988 Chess Olympiad. She currently works as a Principal Engineer at AT&T and lives in Switzerland.

Code Review for Operations

Thursday, 2:45 pm–3:30 pm PST

Spencer Krum, IBM and OpenStack

Available Media

Code review has been shown to help developers produce better code. It can also help SREs run more reliable systems. Our ops team is fanatic about using code review and representing our infrastructure as code so that code review can be leveraged. In this presentation I will show how we use code review to manage our infrastructure, modify and create systems, administer services, etc. I'll discuss why we use code review for our operations work, and where we get value from it. I'll show the path we took to get here, what actions couldn't be piped through code review, and what we're going to do next.

Spencer Krum, IBM and OpenStack

Spencer (nibalizer) Krum (http://spencerkrum.com) has been sysoping Linux since 2010. He works for IBM contributing upstream to OpenStack and Puppet. Spencer is a core contributor to the OpenStack Infrastructure Project. Spencer coordinates the local DevOps user group in Portland and volunteers for an ops-training program at Portland State University called the Braindump. Spencer is a published author and frequent speaker at technical conferences. Spencer is a maintainer for the voxpupuli effort (https://voxpupuli.org), which attempts to bring together a network of Puppet developers, modules, and infrastructure.

Spencer lives and works in Portland, Oregon, where he enjoys tennis, cheeseburgers, and StarCraft II.

Talks II

Constitution Ballroom B

Dyn, DDoS, and the DNS

Thursday, 2:00 pm–2:45 pm PST

Chris Baker, Dyn, Inc.

Available Media

On October 21st, Dyn was the target of a large distributed denial of service (DDoS) attack. This attack disrupted interactions within a large swath of the internet and, suddenly, everyone in the media was talking about the DNS. This talk will cover the DNS landscape and the role that it plays, as well as the shifting trends in distributed denial of service attacks.

Chris Baker, Dyn, Inc.

Chris Baker is an Internet cartographer, data analyst, and wanderlust researcher at Dyn, where he is responsible for an array of data analysis and research projects ranging from business intelligence to Internet measurements and communication analysis. Previously, Chris worked at Fidelity Investments as a senior data analyst. He graduated from Worcester Polytechnic Institute with a master’s degree in system dynamics and a bachelor’s degree in management of information systems and philosophy.

Linux 4.X Tracing Tools: Using BPF Superpowers

Thursday, 2:45 pm–3:30 pm PST

Brendan Gregg, Netflix

Available Media

The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: BPF. Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.

In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix.

Brendan Gregg, Netflix

Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He was previously a performance lead and kernel engineer at Sun Microsystems, where he developed the ZFS L2ARC and led performance investigations. He has also created numerous performance analysis tools, which have been included in multiple operating systems. His recent work includes developing methodologies and visualizations for performance analysis.

Talks III

Back Bay Ballroom AB

Why We Can't Have Nice Things: A Tale of Woe and a Hope for the Future

Thursday, 2:00 pm–2:45 pm PST

Pete Cheslock, Threat Stack

Available Media

Computers are hard, and security is even harder. While you’re building a bespoke host-based intrusion detection system to monitor for advanced persistent threats, vulnerabilities are uncovered in 30-year-old core Unix programs. Even worse, the same junior level operations engineer who can (accidentally) provision thousands of systems and blow your budget away, is the same person who can make one small change to a security group which now allows all access to your back-end systems.

The cloud is making it easier than ever to provision systems to meet your infrastructure needs—and to do so very quickly. Speed to market is a major competitive advantage that many companies are leveraging through the concept of Infrastructure as Code. Provisioning hundreds or thousands of compute instances in mere minutes is now considered an everyday activity. Everyone wants to move fast.

The long contested battlefield of remote access to production machines has only gotten uglier since the rise of the Cloud, which has obliterated the line between building the system and running the system. “Lock out the developers” is not an acceptable policy anymore. Developers inherently build better systems when they experience running them.

Continuous Integration. Continuous Deployment. But who (or what) is continually monitoring the state of your operational security?

We’ll discuss the role of security in this new *aaS landscape. We’ll talk about things to do when you have a dedicated InfoSec team, and tools you can use when you don’t. We’ll explore what it means to build in security in the same way you build in quality as part of your continuous delivery pipelines. And how you can strengthen your security posture while maintaining your ability to move quickly and deliver value to your customers.

Pete Cheslock, Threat Stack

As the head of Threat Stack's operations and support teams, Pete is focused on delivering the highest level of service, reliability, and customer satisfaction to Threat Stacks growing user base. An industry veteran with over 15 years' experience in Operations, Pete understands the challenges and issues faced by security, development, and operations professionals everyday and how we can help. Prior to Threat Stack, Pete held senior positions at Dyn and Sonian where he built, managed, and developed automation and release engineering teams and projects.

Hard Knocks and Soft Spots: A Docker-Centric CI/CD Pipeline at VMware

Thursday, 2:45 pm–3:30 pm PST

Fabio Rapposelli, Staff Engineer 2, VMware, and Ivan Porto Carrero, Staff Engineer, VMware

Available Media

VMware’s R&D team is comprised of thousands of top-notch computer scientists and software engineers always eager to write and test code that ships. The Cloud Native Apps BU recently started working on two new projects and decided to revisit how we ship code. We’ve built a new CI/CD platform based on Docker using Drone, not only for packing our services but also to do reproducible end-to-end testing and creating a perfectly reproducible test experience for developers. Our talk will focus around how Docker makes our lives easier. We’ll go over the platform we’ve built, the pipeline, lessons learned, and next steps.

Fabio Rapposelli, Staff Engineer 2, VMware

Fabio is a Staff Engineer 2 working in the Cloud Native Apps BU, currently working on Project Cello, a platform for self-operating stateful applications. As part of his duties he has built and maintains the CI/CD environment used by the team.

Mini Tutorial I

Commonwealth Ballroom

Living in a Post-Post-Mortem World: Techniques for Incident Analysis

Thursday, 2:00 pm–3:30 pm PST

Connie-Lynne Villani, Grilled Cheese Invitational

Post-mortems are a great start to incident analysis, but are they always necessary? How do you sift through the information to produce a good incident report that people will actually read and act on? Do you even need to produce an incident report, or can you just make a fix and get on with life? What are you forgetting to include, and what are you including that isn’t necessary?

This mini-tutorial will help you answer these questions, with an emphasis on:

How to prepare for incident analysis before anything goes wrong
Learning from systemic failure
Anatomy of a good written incident analysis
The myth of root cause analysis
Conducting an incident review meeting
Studying both "what went wrong" and "what went right"
Developing a culture of responsibility without blame

Who should attend:
Sysadmins whose job focus is site or application stability, or anyone who's ever had to explain "what went wrong."

Take back to work:
After this tutorial, attendees will take with them:

Real-world examples of good incident reviews
Templates for lightweight and in-depth incident analysis
Patterns and anti-patterns for post-event retrospectives
Techniques for holding civil discussions about failure and improvement
Techniques for improving system reliability by learning from success

Topics include:

Incident and root cause analysis
Technical writing
Agile retrospectives
Failure recovery
Monitoring and alerting
Event response

Connie-Lynne Villani, Grilled Cheese Invitational

With degrees in both Electrical Engineering and Theater Management, Connie-Lynne brings 20 years of System Engineering experience to the table, as well as a keen understanding of how to handle drama in the workplace. In addition to founding and managing Groupon's first SRE team, Connie-Lynne has worked at Linden Lab, Change.org, and Caltech, but admits that her most fun position is serving as a board member for the Grilled Cheese Invitational, an annual food festival celebrating all things cheesy.

Mini Tutorial II

Back Bay Ballroom C

The 90-Minute Cassandra DBA

Thursday, 2:00 pm–3:30 pm PST

Chris McEniry, Sony Interactive Entertainment

The last decade has seen a rise of alternatives to the traditional relational database systems with a focus on distributed systems that perform at high speed and scale. Cassandra is one of the leading contenders in this space. Cassandra has shown to be reliable, linearly scalable, distributable using commodity compute and storage resources. These properties have brought demand for its use in mission critical applications especially in high volume distributed applications, such as those seen at "web scale" and with IoT. This mini-tutorial provides a simple introduction to Cassandra for those who have been tapped to support it or want an understanding of the common concepts of this generation of databases.

Who should attend:
SysAdmins interested in or volunteered for administering Cassandra

Take back to work:
Basic understanding of how Cassandra runs and how to support it

Topics include:
Basic Running, Data Modeling, Partitioning, Patterns, Monitoring, Clients

Chris McEniry, Sony Interactive Entertainment

Chris "Mac" McEniry is a practicing sysadmin and architect responsible for running a large E-commerce and gaming service. He's been working and developing in an operational capacity for 15+ years. In his free time, he builds tools and thinks about efficiency.

LISA Lab

Back Bay Ballroom D

Writing (Micro)Services with Flask

Thursday, 2:00 pm–5:30 pm PST

Chris St. Pierre, Cisco Systems, Inc.

Flask is a Python web microframework whose efficient, streamlined feature set makes it an excellent fit for writing microservices that provide a friendly, immanently usable interface to systems management functions.

In this hands-on, interactive tutorial we will write a real, actual RESTful service to retrieve iostat(1) data from the server. Participants will be supplied with skeleton code and unit and functional tests, and will be walked through the process of writing the application to satisfy the tests in a TDD manner.

Participants must bring their own device capable of running Python, and should have at least basic familiarity with Python and RESTful services.

Who should attend:
Sysadmins interested in providing an internet-accessible, automatable interface to complex (or simple) systems management functions.

Take back to work:
The knowledge and demonstrated ability to write RESTful services in Flask.

Topics include:
Basic retrieval and mutation requests, asynchronous processing, database interface, and tracking state

Chris St. Pierre, Cisco Systems, Inc.

Chris St. Pierre is currently serving the thirteenth year of a life sentence to hard labor at the command line. He works as an OpenStack engineer at Cisco and is a core contributor to Rally, the OpenStack benchmarking tool.

3:30 pm–4:00 pm

Break with Refreshments

Grand Ballroom Foyer

4:00 pm–5:30 pm

Talks I

Constitution Ballroom A

Implementing DevOps in a Regulated Traditionally Waterfall Environment

Thursday, 4:00 pm–4:45 pm PST

Jason Victor and Peter Lega, Merck and Co., Inc.

Available Media

DevOps is adopted in so many places, and its benefits are well documented, but despite this, it is not getting the same traction in regulated environments. Is it truly impossible to implement DevOps at a regulated company when someone else makes the rules? Or is it possible to both challenge the status quo and still adhere to essential compliance and risk requirements.

We will provide why regulated companies like Merck—a 125 year-old pharmaceutical company—are challenged to change course. We will explain the complexities of some of these regulations to get a better understanding of the challenge, and how the "path of most resistance" becomes the default release management strategy trap.

Join us midway on our multi-year journey to augment our traditional, waterfall methodology with DevOps/Agile culture and methodology. We will talk about our approach, our tool chain, and how we changed peoples’ minds from "that will never work" to "that's the new way to work."

We are hoping that with this talk, you will walk away from us with a set of ideas on how to implement and overcome your own companies’ obstacles to change.

Jason Victor, Merck & Co., Inc.

Jason Victor is an Associate Director in Merck’s Applied Technology department, with responsibility for organizing DevOps strategy, finding and maintaining open source partnerships, and platform architecture.

Jason has been at Merck since 2001, and since being recruited from TCNJ, has earned a Masters from Drexel University, defined the ITIL implementation at Merck, supported help desk platforms, engaged in solution and enterprise architecture for the corporate and research divisions, and now focuses on Merck’s DevOps strategy.

Peter Lega, Merck & Co., Inc.

Peter Z. Lega is Director of Emerging Technology at Merck & Company, leading the execution and tooling of the emerging technology portfolio, and cultivating creative partnerships around technology, talent, and data. Currently, Lega is working on evolving software practices to support rapid-paced global technology delivery in a regulatory environment.

Before this position, Lega led MSD’s enterprise web and mobility services. He led the construction of the Univadis Portal, serving thousands of healthcare professionals across the globe, which was recognized by the ComputerWorld Laureates program.

Prior to joining MSD, He was VP of technical architecture at c|net networks where he led development of several franchise sites (shareware.com, download.com, and buydirect.com). He also held senior roles including Technical Director at Digital Equipment, Divisional CIO at Bear Stearns, and was a rapporteur on Digital Content for the European Commission.

Peter holds a B.Sc. in Computer Science from Moravian College.

Catch Fire and Halt: Fire in the Datacenter

Thursday, 4:45 pm–5:30 pm PST

Jon Kuroda, University of California, Berkeley

Available Media

What do you do when you have a fire in the datacenter that takes your entire organization down until you can recover? Well, we found out the hard way when, on Friday, September 18, 2015, one of our research group’s servers caught fire at the UC Berkeley campus datacenter, thus activating the facility fire suppression and emergency power-off systems and causing the outage of nearly all campus-hosted online services with recovery efforts lasting through the weekend. We will detail the circumstances surrounding the incident itself, examine the post-mortem process that followed the incident, and compare our experiences with those of other engineering disciplines after the occurrence of a critical incident.

Jon Kuroda, University of California, Berkeley

Jon is a sysadmin and research engineer at the Department of Electrical Engineering at the University of California, Berkeley where he spends his days (and nights) puzzling over HDFS/Spark clusters, debugging business process, and trying to keep datacenter spaces clean(er) and more usable all while trying to keep up with dozens of computer science researchers.

Talks II

Constitution Ballroom B

The 7 Righteous Fights

Thursday, 4:00 pm–4:45 pm PST

Heidi Waterhouse, Documentation Mercenary

Available Media

Usually we think of compound interest as what adds magically to our retirement or makes our student loans last forever. But there is also a compound interest of technical debt, where a project is made harder and more expensive because of early "cost-saving" choices.

I think it's empowering for developers and other people involved in the inception of a project to have tools for making the project better long-term.

The seven things I think should be considered very early in development are:

Localization. Are you ever planning on selling this to someone in another country?
Security. Don't be the organization that has to pay someone for disaster PR. Building in security early saves you a bunch of time and user churn later.
Extensibility. What makes you so sure this API will always be internal?
Documentation. People do not buy software solely based on PowerPoints. You need public docs. The docs have to be more useful than Stack Overflow.
Affordance. UI is not a word. The microtext matters.
Acceptance. Have you shown this to any actual humans who are like the users?
Accessibility. We all use computers different ways. Does your software allow that?

I expect this talk will be relevant to both senior people working on leading project teams, and empowering for juniors who don't have a structure for critiquing usability problems. I want people to leave with an understanding of how small changes in the initial trajectory of a project can lead to greatly improved outcomes.

Heidi Waterhouse, Documentation Mercenary

Heidi is a technical documentation mercenary who specializes in setting up documentation for new products and companies. She is also a globe-spanning speaker who loves to explain to developers how to make their lives easier in the long run. When she's not writing nerdy documents, she helps raise two kids, sews her own clothes, and cycles in Minnesota, even in the winter.

Intelligent Anomaly Detection in Heterogeneous Internet Services

Thursday, 4:45 pm–5:30 pm PST

Dong Wang, Baidu Inc.

Available Media

When talking about anomaly detection in Internet services, most of us usually imagine a scenario in which lots of curves monitor the various metrics and some fixed thresholds tell something wrong. However such simple ways are far from effective nowadays. In the talk I am going to address many machine learning based intelligent approaches to do anomaly detection in lots of heterogeneous Internet services. All of the services mentioned here are from one of the top IT companies in the world, Baidu, whose business includes search engine, location based service (LBS), finance and payment, etc. The total users they cover are more than one billion. The approaches mentioned in this talk are already actively used in Baidu’s real products.

Dong Wang, Baidu Inc.

Dong Wang is a principal architect at Baidu, the largest search engine in China, and has led Baidu’s SRE team to work on some challenging projects, such as automatic anomaly detection and issue fixing in large scale Internet sites. He is also interested in user experience improvement in the mobile Internet services. Prior to Baidu, he worked at Bell Labs and Google for more than 15 years in total.

Talks III

Back Bay Ballroom AB

No User Left Behind: Making Sure Customers Reach Your Service

Thursday, 4:00 pm–4:45 pm PST

Mohit Suley, Software Engineer, Live Site Engineering team, Bing

Available Media

Can you detect users behind a single ISP, in a single city, losing access to your service because of a network configuration error? If a low-tier ISP in Brazil blocked you accidentally, affecting a few thousand customers, how will you find out before customers switch? This talk will define the concept of reachability, show ways to detect globally minuscule but customer-affecting outages across geographies and ISPs, and provide practical examples of how you can extend the definition of availability to include your users, not just the systems you built.

Mohit Suley, Bing.com

Mohit is an Availability Engineer on Bing's Live Site Engineering team. By day, he investigates all the issues that subtly affect Bing’s availability and performance. Designing systems to proactively improve availability, route around problems, is a core mission of the team. He loves long walks, talking about end-user availability and how network-level data can tell interesting stories about customer experience in aggregate. R is his go-to data analysis tool these days. Opportunities to dive into network flows, architecture issues or scaling problems never go ignored.

From BOFH to Just Another Person in the Standup, Surviving the Move to DevOps

Thursday, 4:45 pm–5:30 pm PST

Jamie Riedesel, HelloSign

Available Media

Everything-as-a-Service is taking a big bite out of traditional IT. Not just in jobs, but in the nature of the job itself. Some of you will decide the time is right to learn some new skills, and go work for one of those EaaS companies. Culturally, moving from traditional IT to small-team agile is a huge change. The techniques of maintaining psychological safety within TradIT can put you at risk for being a no-asshole-rule fire in software organizations. In this session, we’ll go over ways to reframe this safety reflex into new paths.

Jamie Riedesel, HelloSign

Jamie Riedesel is a DevOps Engineer at HelloSign and has been performing acts of systems administration and engineering since 1997, and more dev-like things since 2010. She moved from corporate IT to the startup space in 2010 and experienced the good kind of culture shock. Jamie has been blogging as sysadmin1138 since 2004, a community elected moderator on ServerFault since 2010, and awarded the Chuck Yerkes community award by LOPSA in 2015.

Mini Tutorial I

Commonwealth Ballroom

Managing Dispersed Teams

Thursday, 4:00 pm–5:30 pm PST

Scott Cromar, Senior Manager, Convergys

Geographically dispersed teams have become a reality in most large workplaces. Organizations have pressed to reduce costs by shifting work from high cost locations to lower cost locations, and efficiencies in system management have allowed organizations to move away from hiring teams localized in a single site.

Managing a dispersed team brings its own set of challenges and opportunities. Cultural, linguistic, time zone, and other differences can cripple a team if not managed properly. But there are also opportunities that come from an increased diversity of viewpoints or follow-the-sun scheduling.

This workshop will help people in geographically dispersed teams to avoid some common pitfalls and to structure their work for success.

Who should attend:
People who have a leadership position on a technical team, or who aspire to a leadership position.

Take back to work:
Attendees will learn some techniques and mindsets important to the challenge of managing a team with dispersed members.

Topics include:

Common types of dispersed teams
Efficient communication
Coordination, collaboration, and control
Building cohesion
Cultural expectations and norms
Structuring processes and procedures for success
Scheduling and creating a workflow rhythm

Scott Cromar, Senior Manager, Convergys

Scott Cromar is an experienced IT manager who still remembers what it was like to step into his first leadership position from a technical role. He has assembled diverse, multifunctional, globally distributed operational teams for several employers over his career, and he enjoys the challenge of creating a team from a group of talented individuals.

Mini Tutorial II

Back Bay Ballroom C

Making Developers More Productive with Vagrant, VirtualBox, and Docker

Thursday, 4:00 pm–5:30 pm PST

John J. Rofrano, IBM T.J. Watson Research Center

One of the biggest time sinks in development is setting up your development and test environment. Whether you are a new to a project and need to set up your workstation with everything you need for the first time to start coding, or you just need a clean environment to test in, installing all of the software required to create a complete development environment is always time consuming. Every hour installing software is an hour you’re not delivering value to the customer. Learn how to leverage the powerful trio of Vagrant, VirtualBox, and Docker containers to create instant development and test environments right on your laptop with this mini-tutorial by John Rofrano.

This tutorial covers the fundamentals needed to understand and leverage each of the three technologies:

Vagrant and VirtualBox:

Installing VirtualBox and Vagrant on your laptop
Getting started with Vagrant basic commands
Understanding how to modify your Vagrantfile
Set up a visual network and forward ports so that it looks like your application is running locally on your laptop
Techniques and strategy for installing the required software for development
Using a shared filesystem to keep your code physically on your laptop so that you can edit with your favorite desktop editor while running it virtually in a VM

Docker Containers:

Using Docker containers with Vagrant for supplying middleware
Getting started with Docker basic commands
Docker command line parameters for share the file system and redirecting ports
Linking Docker containers to your code

Putting it all together:

Getting in and running your application
Delivering Vagrantfiles and Dockerfiles as part of your git repo
Documenting how new developers can easily get started with two commands!
Blow it all away and do it again

I have used these technologies with several DevOps projects at IBM Research, and my teams have always been very productive from the very first day on the project. I never publish a project to github without having these tools in place.

Who should attend:
The target audience for this tutorial is primarily developers and team leaders who need to work on Linux server environments on their laptop but don’t want to install all of the software physically on their laptop, or cannot because they don’t use Linux as their desktop OS. This would also be helpful to development managers who want to make their teams instantly more productive.

Take back to work:
Attendees should bring back the knowledge of how to make themselves and their development teams more productive including knowledge of:

How to install VirtualBox and Vagrant
How to set up Vagrant to spin up virtual development environments
How to find and leverage Docker containers to provide instant middleware
How to document procedures for developers to get up and running quickly

Topics include:
Topics covered in this mini-tutorial include learning how to use:

VirtualBox
Vagrant
Docker
Git repos

John J. Rofrano, IBM T. J. Watson Research Center

John Rofrano is a Senior Technical Staff Member at IBM T.J. Watson Research Center where he leads a team of researchers working on cloud native migration technologies leveraging IBM Bluemix, Cloud Foundry, Docker, DevOps pipelines, and building microservices using Python/Flask. He is part of a elite team of DevOps Champions that are fostering the DevOps culture at IBM.

John is also an Adjunct Professor at New York University teaching DevOps and Cloud courses. He is the author of numerous papers and patents in the field of computer science and several books on video editing and music creation.

LISA Lab (continued)

Back Bay Ballroom D

Writing (Micro)Services with Flask

Thursday, 2:00 pm–5:30 pm PST

Chris St. Pierre, Cisco Systems, Inc.

Flask is a Python web microframework whose efficient, streamlined feature set makes it an excellent fit for writing microservices that provide a friendly, immanently usable interface to systems management functions.

In this hands-on, interactive tutorial we will write a real, actual RESTful service to retrieve iostat(1) data from the server. Participants will be supplied with skeleton code and unit and functional tests, and will be walked through the process of writing the application to satisfy the tests in a TDD manner.

Participants must bring their own device capable of running Python, and should have at least basic familiarity with Python and RESTful services.

Who should attend:
Sysadmins interested in providing an internet-accessible, automatable interface to complex (or simple) systems management functions.

Take back to work:
The knowledge and demonstrated ability to write RESTful services in Flask.

Topics include:
Basic retrieval and mutation requests, asynchronous processing, database interface, and tracking state

Chris St. Pierre, Cisco Systems, Inc.

Chris St. Pierre is currently serving the thirteenth year of a life sentence to hard labor at the command line. He works as an OpenStack engineer at Cisco and is a core contributor to Rally, the OpenStack benchmarking tool.

6:30 pm–8:30pm

LISA16 Conference Reception

Grand Ballroom Complex

8:30 pm–11:30 pm

Birds-of-Feather Sessions

View scheduled BoFs on the LISA16 BoFs page.

Friday, December 9, 2016

8:00 am–12:00 pm

On-Site Registration and Badge Pickup

Grand Ballroom Foyer

7:30 am–9:00 am

Continental Breakfast

Grand Ballroom Foyer

9:00 am–10:30 am

Keynote Address

Constitution Ballroom

Identifying Emergent Behaviors in Complex Systems

Friday, 9:00 am–10:30 am PST

Jane Adams, Two Sigma

Available Media

Forager ants in the Arizona desert have a problem: after leaving the nest, they don’t return until they’ve found food. On the hottest and driest days, this means many ants will die before finding food, let alone before bringing it back to the nest. Honeybees also have a problem: even small deviations from 35ºC in the brood nest can lead to brood death, malformed wings, susceptibility to pesticides, and suboptimal divisions of labor within the hive. All ants in the colony coordinate to minimize the number of forager ants lost while maximizing the amount of food foraged, and all bees in the hive coordinate to keep the brood nest temperature constant in changing environmental temperatures.

The solutions realized by each system are necessarily decentralized and abstract: no single ant or bee coordinates the others, and the solutions must withstand the loss of individual ants and bees and extend to new ants and bees. They focus on simple yet essential features and capabilities of each ant and bee, and use them to great effect. In this sense, they are incredibly elegant.

In this talk, we’ll examine a handful of natural and computer systems to illustrate how to cast system-wide problems into solutions at the individual component level, yielding incredibly simple algorithms for incredibly complex collective behaviors.

10:30 am–11:00 am

Break with Refreshments

Grand Ballroom Foyer

11:00 am–12:30 pm

Talks I

Constitution Ballroom A

Lost Treasures of the Ancient World

Friday, 11:00 am–11:45 am PST

David Blank-Edelman, Apcera

Available Media

In the deep, not-so-dark recesses of a former employer's data center lives an ancient server. This server was central to their infrastructure for years before I arrived and was still in active use after I left, 19 years later. Scared yet?

With the permission of its owner, I began an archeological excavation with this server as my dig site. What could I learn by studying the contents of a machine that was the backbone of the environment for at least 25 years? How has system administration changed over that time period? How has it stayed the same? What mistakes were made? What have we learned since then and what have we forgotten? Could it help us understand the future of our current state-of-the-art practices? All of this and more, my friends. All of this and more.

Dead servers tell no tales, but this server isn't dead yet. Come hear what the past wants to tell us about our future.

David Blank-Edelman, Apcera

David is the Technical Evangelist at Apcera. He has spent close to thirty years in the systems administration, DevOps, and SRE field in large multi-platform environments including Brandeis University, Cambridge Technology Group, MIT Media Laboratory, and Northeastern University. He is the author of the O'Reilly Otter book Automating System Administration with Perl and is a frequent invited speaker at conferences in the field. David is honored to serve on the USENIX Board of Directors. He prefers to pronounce Evangelist with a hard 'g’.

Zero Trust Networks: Building Systems in Untrusted Networks

Friday, 11:45 am–12:30 pm PST

Evan Gilman, PagerDuty, Inc.

Available Media

Let's face it—the perimeter-based architecture has failed us. Today's attack vectors can easily defeat expensive stateful firewalls and evade IDS systems. Perhaps even worse, perimeters trick people into believing that the network behind it is somehow "safe," despite the fact that chances are overwhelmingly high that at least one device on that network is already compromised.

It is time to consider an alternative approach. Zero Trust is a new security model, one which considers all parts of the network to be equally untrusted. Taking this stance dramatically changes the way we implement security systems. For instance, how useful is a perimeter firewall if the networks on either side are equally untrusted? What is your VPN protecting if the network you're dialing into is untrusted? The Zero Trust architecture is very different indeed.

In this talk, we'll go over the Zero Trust model itself, why it is so important, what a Zero Trust network looks like, and what components are required in order to actually meet the challenge.

Evan Gilman, PagerDuty, Inc.

Evan is currently a Site Reliability Engineer at PagerDuty. With roots in academia, he finds passion in both reliable, performant systems, and the networks they run on. When he's not building automated systems for PagerDuty, he can be found at the nearest pinball table or working on his upcoming book, Zero Trust Networks.

Talks II

Constitution Ballroom B

The Devopsification of Windows Server 2016

Friday, 11:00 am–11:45 am PST

Jeffrey Snover, Microsoft

Available Media

Everyone knows that Devops is not about technology—it is about culture and process. But some technologies make some certain processes and cultures difficult and other technologies makes them easy. This session explores why and how Windows Server 2016 was developed with DevOps in mind and what this means to customers adopting a devops workflow. WS2016 is the largest architectural change since NT and lays the foundation for a cloud-paced world. It introduces:

A new Just enough OS base for the OS: NanoServer
A new way to package apps: Windows Server Applications (WSA)
A new way to configuration systems & applications: Desired State Configuration (DSC)
A new way to find, download and set up app/tool repositories: OneGet
A new way to test code and the operational validation of environments: Pester and OperationValidation Framework
A new way to ensure secure operations: Just Enough Admin
New remoting: OpenSSH
New isolation and mgmt. model: Containers and Docker on Windows Server

Consul as a Monitoring Solution

Friday, 11:45 am–12:30 pm PST

Seth Vargo, Director of Evangelism, HashiCorp

Available Media

There are two sides to monitoring—exposing problems with alerts and acting upon those alerts to find solutions to the exposed problem. For exposing problems, users can define any script for Consul to intelligently check and report the health status of all nodes in a cluster. These scripts could be as simple as returning a 200, or as complex as querying the load and query response time on a database server. Other monitoring solutions already provide such functionality, but where Consul shines is in the second half of monitoring—automatic intervention to find solutions to problems without human operators.

Since Consul has built-in health checking, it not only notifies operators of a node or service failure, but automatically routes traffic away from unhealthy nodes. Consul is also able to re-route traffic back to a troubled node, once the node reports it is healthy again. In this way Consul pushes the existing paradigms of monitoring, making it much more than a simple notification system. Rather it surfaces problems and solves them without human intervention. Don’t worry about that pager going off in the middle of the night—rest easy with Consul.

Seth Vargo, Director of Evangelism

Seth Vargo is the Director of Evangelism at HashiCorp. Previously, Seth worked at Chef (Opscode), CustomInk, and a few Pittsburgh-based startups. He the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, or speaking at conferences, Seth enjoys spending time with his friends and advising non-profits. He loves all things bacon.

Talks III

Back Bay Ballroom AB

Everything Is Terrible, and We're to Blame

Friday, 11:00 am–11:45 am PST

Jim Perrin, The CentOS Project

Available Media

A quick look around the IT industry tells us that it's primarily held together with duct tape and bubblegum, and as we drive technology into more places, the problem is only getting worse. We laugh at companies who don't do things "right" or empathize with sysadmins caught in the middle, but when it comes down to it, WE are the ones ultimately responsible.

Jim Perrin, The CentOS Project

Jim has been a consultant for both the Energy and Defense industries as well as a few startups. Through this experience he has gained a unique insight into all manners of creative horrors done to computers in the name of progress. He currently spends his days as a board member of The CentOS Project, and maintaining the AArch64 build of CentOS Linux.

Using Open Source Telemetry to Drive Change in Management Environments

Friday, 11:45 am–12:30 pm PST

Susan Young, Office of the CTO, Dell EMC

Available Media

This talk will discuss the different dimensions of telemetry as a means of gathering system intelligence and driving operator and automated changes in a modern data center. As part of the presentation, we will look at the role of different open source telemetry frameworks in the context of telemetry collection, processing, and publishing. The talk also looks at examples of the use of system telemetry as a data source through the lens of different problem spaces such as system automation, the Internet of Things (IoT), and machine learning.

Attendees will take back knowledge of how to use open source telemetry frameworks, as well as applicable patterns, algorithms and protocols to create telemetry-based solutions that solve specific business problems. Specifically they will acquire knowledge of how to incorporate telemetry into solutions that increase the value of their infrastructure through greater automation and the creation of data streams that can be used to drive data insights.

Susan Young, Office of the CTO, Dell EMC

Susan Young is a Senior Consultant Technologist in Dell EMC’s Global Office of the CTO, focused on Converged Infrastructure Management and Orchestration. In her current role, she engages with multiple Dell EMC business units and companies in developing solutions and platforms for Converged Infrastructure. Prior to joining the Global Office of the CTO, Susan was engaged on a Labs-as-a-Service initiative within Dell EMC that was focused on delivering self-service lab automation for Dell EMC’s own development and test engineers. As a result of this collective experience, she has a unique perspective on the challenges of driving change automation and data analysis through the consumption of telemetry and other system-derived data sources.

Mini Tutorial I

Commonwealth Ballroom

Security Compliance for Containers and VMs with OpenSCAP

Friday, 11:00 am–12:30 pm PST

Martin Preisler, Red Hat

The core focus of this mini-tutorial is how to do a SCAP evaluation of containers and virtual machines that are part of infrastructures deployed in production.

SCAP is a set of specifications related to security compliance. The primary use-case is to ensure a system is configured according to a predefined policy. It is heavily used in government, defense, and finance industries. In this tutorial we will go through all the necessary steps towards a continuous compliance setup of an infrastructure. We will start by installing the tools and preparing the SCAP content. Then we will proceed to scan a single machine for compliance, further refining the content. After that we will discuss differences between scanning a bare-metal machine, virtual machine, and a container. Then we will explore how to scan continuously and how to scan multiple instances at once.

For vulnerability scans we will be using Red Hat Enterprise Linux 6 and 7. For security compliance we will use United States Government Configuration Baseline and Payment Card Industry policies as examples.

Who should attend:
System administrators, especially from government contractors, defense, finance and telecommunication industries; Decision makers that need security compliance for regulatory purposes or for proactive security; Dev-ops interested in proactive security

Take back to work:

What is SCAP? Where can it be used?
Where do I get SCAP content? Where do I get the tools?
How to use SCAP for automated vulnerability scans
How to use SCAP for automated security policies
Customizing existing SCAP content for specific deployments

Topics include:

Vulnerabilities
Common Vulnerability Enumeration
Project Atomic
SCAP
OpenSCAP
SCAP Workbench
oscap tool, oscap-ssh, oscap-docker, oscap-vm
atomic scan
SCAP Security Guide
tailoring / customization of SCAP content
SCE
Spacewalk/Satellite 5 SCAP integration
Foreman/Satellite 6 SCAP integration
USGCB, PCI-DSS, DISA STIG compliance

Martin Preisler, Red Hat

Martin Preisler works as a software engineer at Red Hat, Inc. He works on the Security Technologies team, focusing on security compliance using Security Content Automation Protocol. He is the principal author of SCAP Workbench, a frequent contributor to OpenSCAP and SCAP Security Guide, and a contributor to the SCAP standard specifications. Outside of Red Hat, he likes to work on open source projects related to real-time 3D rendering and game development.

Mini Tutorial II

Back Bay Ballroom C

Machine Learning for SREs

Friday, 11:00 am–12:30 pm PST

Matt Harrison, MetaSnake

What is Machine Learning? How do you use it? With a little Python knowledge and background information, machine learning is very approachable. This tutorial will provide and introduction to the various fields and show some examples that apply to SREs.

Attendees should come with a laptop with Python 3, Jupyter, Pandas, and Matplotlib installed. (Consider using the Anaconda distribution for easy installation).

Who should attend:
Folks curious about how they might use machine learning

Take back to work:
Basic knowledge of machine learning concepts and how to use Python to apply them

Topics include:
scikit-learn, jupyter, and python

Matt Harrison, MetaSnake

Matt is a Python user, presenter, author, and user group organizer. He authored best selling books Treading on Python, Vols. 1 & 2, and Learning the Pandas Library. He runs MetaSnake which provides Python and Data Science consulting as well as corporate training.

2:00 pm–3:30 pm

Talks I

Constitution Ballroom A

Network-Based LUKS Volume Decryption with Tang

Friday, 2:00 pm–2:45 pm PST

Brian J. Atkisson, Red Hat

Available Media

LUKS has long been the standard for volume encryption on Linux systems. It is easy to use and provides a high level of data security, especially for Linux laptops. However, using LUKS for encrypting server volumes, especially root volumes, poses significant issues when managing systems at scale. Your options to date for providing LUKS root volume encryption have been to establish a remote console connection at system boot or to store a key blob unsecured. Obviously, the former is not possible with more than a handful of systems, and the later eliminates any security gains made by using encryption in the first place.

The use-case for server disk encryption somewhat differs from laptop encryption. You want a system to be able to boot without admin interaction while in your secured operating environment, but should be secured should someone attempt to access the volume by other means. Common examples would be sending a failed disk back to a vendor, a third party gaining access to your back-end storage array or AWS volumes.

This talk will focus on a solution to this problem and demonstrate how one can use a network-based service to securely unlock LUKS volumes at boot while maintaining encrypted data at rest.

Brian J. Atkisson, Red Hat

Brian J. Atkisson has 18 years of production systems engineering and operations experience, focusing primarily on identity management and virtualization solutions. He has worked in these roles for the University of California, Jet Propulsion Laboratory, and Red Hat, Inc. He is a Red Hat Certified Architect and Engineer, in addition to holding many other certifications and a B.S. in Microbiology. He currently is a Senior Principal Systems Engineer on the Identity and Access Management team within Red Hat IT.

Panel: Sysadmins Ask the Managers

Friday, 2:45 pm–3:30 pm PST

Moderator: Andy Seely, Science Applications International Corporation
Panelists: Carolyn Rowland, National Institute of Standards and Technology (NIST); Cory Lueninghoener, Los Alamos National Laboratory; Mike Rembetsy, Bloomberg; Scott Cromar, Convergys; Connie-Lynne Villani, Grilled Cheese Invitational

Available Media

Sysadmins Ask the Managers is a guided question and answer panel with technical managers from different industries and at different managerial levels. Part "Dilbert," part "One FTE," and part "XKCD," the panel will offer management vision and explore management topics as they relate to every sysadmin’s life. Come to ask the hard questions you always wished you could, and hear managers talk about how they define success, what they see as their challenges, what they expect out of employees, and how they can help you. Take away real insight into the relationship between sysadmins and managers that will lead to stronger relationships, reduced friction, and more effective teams.

The managers panel will consist of four to five people from different industries and levels of management, where all have at least direct supervisory responsibility. The goal will be a range of junior to senior managers.

Andy Seely, SAIC

Andrew “Andy” Seely is a Solution Architect for Science Applications International Corporation (SAIC) in Tampa, Florida. He got his start in IT as a night shift tier-one and spent the last 20 years in the government and military sector, growing as a sysadmin, engineer, supervisor, and eventually a technical manager for small and large teams. Along the way, he learned that the hardest skill of a good manager is knowing when to be the boss and knowing when to be quiet and listen. Andy shared many of his management experiences in his bimonthly column “/var/log/manager” for USENIX ;login: magazine in 2014 and 2015, and he has chaired the Government and Military System Administration Workshop at LISA almost every year since 2008.

Talks II

Constitution Ballroom B

An Admin's Guide to Data Visualization

Friday, 2:00 pm–2:45 pm PST

Caskey L. Dickson, Microsoft Corporation

Available Media

Go beyond the line and bar chart. Come learn the essentials of presenting complex numerical data in a meaningful and actionable manner. Don't just toss up a table of unintelligible numbers. Use that information to tell a story and do it in a way that is compelling, not confusing. Learn the techniques and pitfalls to convey real meaning with your valuable data.

This talk will cover the basics of data presentation including common techniques and pitfalls. The goal is to move people beyond the wall of numbers and enable coherent visualization of large data sets.

If you’ve ever been in a presentation where a wall of numbers is thrown up, leaving it up to you to find the meanings and trends, then this talk will show you how to convert that data into a story. Commonly used graph types will be covered as well as discussion as to when and how each type is most appropriately used. Examples will be provided of both good and bad cases of data presentation, and attendees will come away with both and understanding of how to present data effectively as well as the psychology of how people interpret visual data.

Topics covered will include

Bertin Efficiency
Why pie charts should die
Common graphs (line, bar, scatter)
Advanced graphs (box plots, cycle plots, trellises, and nightingales)
Binning (hexagonal, rectangular, etc.)
Aspect ratios and “banking to the 45”
Scales and axes

Caskey L. Dickson, Microsoft Corporation

Caskey L. Dickson is a Site Reliability Engineer at Microsoft where he is part of the leadership team reinventing operations at Azure. Before that he was at Google where he worked as an SRE/SWE writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. He has worked in online services since 1995 when he turned up his first web server and has been online ever since. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

Interplanetary DevOps at NASA JPL

Friday, 2:45 pm–3:30 pm PST

Dan Isla, NASA Jet Propulsion Laboratory

Available Media

At the Jet Propulsion Laboratory, real-time analytics for data collected from the Mars Rover Curiosity is critical when millions of telemetry data points are received daily. Building portable containerized data systems and tools that can be continuously deployed enables our Systems Engineers and Data Scientists to quickly develop, analyze, and share their visualizations and algorithms. With the AWS GovCloud region, export-controlled data can be securely stored and processed using the familiar AWS services and APIs that scale on demand. Containers, DevOps, and high levels of automation are the most important concepts when building infrastructure at scale that can be robust and operated by just a few admins. DevOps is more than just automation and fancy tools and is really about culture change within the organization. At JPL and other government agencies, legacy is everywhere from the apps to the ops; with the Analytics Cloud Services, we have successfully demonstrated ways to modernize legacy systems using containers to make them more secure and operable on modern infrastructure. In this talk, Dan will share how his team revolutionized Interplanetary Mission Operations and created a new paradigm for software development and collaboration at JPL.

Dan Isla, NASA Jet Propulsion Laboratory

Dan Isla is a Systems Engineer and Data Scientist at the NASA Jet Propulsion Laboratory. Dan has been at JPL for the past seven years building and launching spacecraft to Mars and beyond. He is now the lead engineer for the Analytics Cloud Services, a container based PaaS built with Mesos and Elasticsearch that has transformed Mission Operations across the agency.

Talks III

Back Bay Ballroom AB

Unik: A Platform for Automating Unikernels Compilation and Deployment

Friday, 2:00 pm–2:45 pm PST

Idit Levine, Dell EMC

Available Media

Unikernels, executable images that can run natively on a hypervisor without the need for a separate operating system, are rapidly gaining momentum. To integrate unikernels into the echo-system, cloud-computing platforms as a service are required to provide unikernels with the same services they provide for constrainers. Here we present Unik, a open source orchestration system for unikernels. Unik handles the compilation of libraries and applications for running on a variety of cloud providers, manages their scheduling, and ensures their health. To provide the user with a seamless PaaS experience, Unik is integrated as a backend to Docker, Kubernetes, and Cloud Foundry runtime.

Idit Levine, Dell EMC

Idit Levine is the CTO for cloud management division at EMC and a member of its global CTO office. Her passion and expertise are focused on Management and Orchestration (M&O) over the entire stack and on microservice, cloud native apps, and Platform as a Service. Idit’s fascination with the cloud sprouted when she joined DynamicOps (vCAC, now part of VMware) as one of its first employees. She subsequently took part in developing the new-generation public cloud of Verizon Terremark, and served as an acting CTO at Intigua, a startup company that focuses on container and management technology.

Failure: Why It Happens & How to Benefit from It

Friday, 2:45 pm–3:30 pm PST

VM (Vicky) Brasseur, Hewlett Packard Enterprise

Available Media

Projects fail in droves. Up to 90% of new businesses fail within 10 years. Screws fall out all the time; the world is an imperfect place.

Just because it happens doesn’t mean we can’t do our best to prevent it or—at the very least—to minimize the damage when it does. As a matter of fact, embracing failure can be one of the best things you do for your project. Failure is a vital part of evolution. By learning to love failure we learn how to take the next step forward. Ignoring or punishing failure leads to stagnation and wasted potential.

During this session I'll cover:

The most common causes for failure
Suggestions for how to avoid failing
How to use failure to your advantage

VM (Vicky) Brasseur, Hewlett Packard Enterprise

In VM's (aka Vicky) nearly 20 years in the tech industry she has been an analyst, programmer, product manager, software engineering manager, director of software engineering, and C-level technical business consultant. She currently is proud to be a Senior Engineering Manager at Hewlett Packard Enterprise, working in service to a team 100% dedicated to open source development. Vicky is the winner of the Perl White Camel Award (2014) and the O'Reilly Open Source Award (2016).

Vicky occasionally blogs at {anonymous => 'hash'};, often writes and is a moderator for opensource.com, and frequently tweets at @vmbrasseur.

Mini Tutorial I

Commonwealth Ballroom

Culture-Driven Incident Response

Friday, 2:00 pm–3:30 pm PST

Dave Nicholson, SRE, Atlassian

Effective Incident Response is hard. While large scale frameworks such as FEMA ICC can be adapted to SRE and Ops, making these work within groups needs to take your working culture into consideration. In this class we examine how Atlassian integrates Incident Management into existing workflows and tooling, using the prevalent work culture to drive resolution of incidents and adoption of a consistent Incident Management process.

Who should attend:
Anyone who deals with incidents

Take back to work:
How to leverage your existing tools and culture to solve new problems

Topics include:
Incident Management, Post Incident Review, Incident Command

Dave Nicholson, SRE, Atlassian

Dave has worked at Atlassian for the past four years, starting as a Support Engineer and more recently as a Site Reliability Engineer. Prior to that, he has worn many technology hats, including Java development, consulting, and supporting healthcare systems. Outside of work, he is active in charity work and a member of the Board of Directors for the Central Texas SPCA.

Mini Tutorial II

Back Bay Ballroom C

Running with MongoDB: Build Your Cluster Like a Champ!

Friday, 2:00 pm–3:30 pm PST

Nuri Halperin, Plus N Consulting, Inc.

MongoDB is an open-source, document-oriented, NoSQL database that is fast becoming the default choice for many organizations. Although cunningly easy to run out of the box, MongoDB is a first class database engine, which merits proper care and knowledge to run successfully in a large scale production environment.

Attendees will learn and implement several techniques to make a MongoDB installation run smoothly. These include replica-set, deployment topology, sharding for scale, and performance tuning and insights.

Who should attend:
System administrators, DBA, DevOps and anyone taking on supporting and operating a MongoDB deployment in production.

Take back to work:

A solid understanding of MongoDB's scale and durability mechanisms.
How to plan and deploy replica sets and sharded clusters
How do diagnose and remedy(!) performance issues.

Topics include

Mongo's durability and scale mechanisms
Setting up replica sets
Setting up sharding
Detecting performance issues
Addressing performance issues with proper indexing and planning

Nuri Halperin, Plus N Consulting, Inc.

Nuri Halperin helps companies optimize their software investments. His company designs and builds scalable systems, websites, and business applications. He's been turning projects into success stories for a variety clients for over two decades. From founding CTO of Jdate.com to international e-commerce multilingual websites, to social photo sharing—he gets things done.

Nuri is a frequent speaker at tech events, and author of several online courses. His instructional videos can be found on MSDN Channel 9 and at Pluralsight.com. He is a strong supporter of the developer community, helping people expand their knowledge and capabilities.

Nuri was the inaugural recipient of MongoDB's William Zola Outstanding Contributor Award, and is a MongoDB Master.

3:30 pm–4:00 pm

Break with Refreshments

Grand Ballroom Foyer

4:00 pm–5:30 pm

Closing Plenary

Constitution Ballroom