Skip to main content
USENIX
  • Conferences
  • Students
Sign in
  • Home
  • Attend
    • Registration
    • Discounts
    • Venue, Hotel, and Travel
    • Why Attend?
    • Students and Grants
    • Speaker Resources
  • Program
    • Program at a Glance
    • Conference Program
    • Training Program
    • Workshop Program
    • Conference Topics
    • Co-Located Events
      • URES '14 West
      • SESA '14
      • Puppet Camp Seattle
      • LISA Data Storage Day
      • CentOS Dojo Seattle
    • Activities
      • LISA Build
      • LISA Lab
      • Birds-of-a-Feather Sessions
      • Poster Session
      • LISA14 Expo
  • Sponsors and Expo
    • LISA14 Expo
    • Sponsors/Exhibitors List
    • Exhibitor Services
    • Download Prospectus (PDF)
  • About
    • Conference Organizers
    • Past Conferences
    • Services
    • Contact Us
    • Code of Conduct
    • Original Call for Participation
    • Help Promote

sponsors

Gold Sponsor
Gold Sponsor
Gold Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Silver Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
Bronze Sponsor
General Sponsor
General Sponsor
General Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Media Sponsor
Industry Partner
Industry Partner
Industry Partner
Industry Partner
Industry Partner
Industry Partner

connect with us


  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Google+
  •  YouTube

usenix conference policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

You are here

Home » Program » Conference Program
Tweet

connect with us

http://twitter.com/lisaconference
https://www.facebook.com/usenixassociation
http://www.linkedin.com/groups/USENIX-Association-49559/about
https://plus.google.com/108588319090208187909/posts
http://www.youtube.com/user/USENIXAssociation

Conference Program

A variety of topics are being covered at LISA '14. Use the icons listed below to focus on a key subject area:


  • Culture

  • DevOps

  • Monitoring/Metrics

  • Security

  • Systems Engineering

Follow the icons throughout the conference program below. You can combine days of training or workshops with days of conference program content to build the conference that meets your needs. Pick and choose the sessions that best fit your interest—focus on just one topic or mix and match.

The conference papers are available to registered attendees immediately and to everyone beginning Wednesday, November 12, 2014. Everyone can view the abstracts and the proceedings front matter immediately.

Proceedings Front Matter
Cover Page | Title Page and List of Organizers | Table of Contents | Message from the Program Chair

Full Proceedings PDFs
 LISA14 Full Proceedings (PDF)
 LISA14 Proceedings Interior (PDF, best for mobile devices)

Full Proceedings ePub (for iPad and most eReaders)
 LISA14 Full Proceedings (EPUB)

Full Proceedings Mobi (for Kindle)
 LISA14 Full Proceedings (MOBI)

Download Proceedings (Conference Attendees Only)

Attendee Files 

(Registered attendees: Sign in to your USENIX account to download this file.)

LISA14 Proceedings Archive (ZIP, includes attendee lists)

 

Wednesday, November 12, 2014

8:00 am–8:45 am Continental Breakfast Second Floor Foyer

8:45 am–9:00 am Wednesday

Opening Remarks and Awards

Session Chair: Nicole Forsgren Velasquez, Utah State University

Opening Remarks and Awards

Opening Remarks

Session Chair: Nicole Forsgren Velasquez, Utah State University

Grand Ballroom ABC

  • Read more about Opening Remarks and Awards
9:00 am–10:30 am Wednesday

W-Keynote Address

LISA14: Syseng

Open Compute Project and the Changing Data Center

9:00 am-10:30 am
Keynote Address

Ken Patchett, Facebook

Grand Ballroom ABC

Ken is responsible for Facebook’s data center operations in the Western Region, including the company's facilities in Prineville, Oregon, and Altoona, Iowa. Altoona is Facebook’s fourth owned and operated data center and is built to specifications developed as part of the Open Compute Project (OCP). Prior to joining Facebook in 2010, Ken established data centers for Google across the United States and Asia. His career has spanned several industries, from mechanical engineering at a pulp paper manufacturer, to rising through the ranks of Compaq and Microsoft, where he initiated its network operations team, including security management, routing, switching, and content networking technologies.

We’ve all seen the impact that open source has had on innovation in software; open sharing and collaboration have been at the root of some of our greatest achievements as an industry. Similarly, the Open Compute Project, a prominent industry initiative focused on driving greater openness and collaboration in infrastructure technology, has cultivated a community working together establishing common standards for scalable and highly efficient technologies that everyone can adopt and build upon, from the bottom of the hardware stack to the top. Facebook’s Ken Patchett will provide a brief history of the project and an overview of current technologies, and discuss how open compute platforms are shaping the future of computing.

We’ve all seen the impact that open source has had on innovation in software; open sharing and collaboration have been at the root of some of our greatest achievements as an industry. Similarly, the Open Compute Project, a prominent industry initiative focused on driving greater openness and collaboration in infrastructure technology, has cultivated a community working together establishing common standards for scalable and highly efficient technologies that everyone can adopt and build upon, from the bottom of the hardware stack to the top. Facebook’s Ken Patchett will provide a brief history of the project and an overview of current technologies, and discuss how open compute platforms are shaping the future of computing. He will share learnings from his work as Facebook’s director of data center operations in the Western region and highlight how open source is changing the data center.

Available Media

  • Read more about Open Compute Project and the Changing Data Center

10:30 am–11:00 am Break with Refreshments Second Floor Foyer

11:00 am–12:30 pm Wednesday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Cedar AB Willow A

W-Talks 1b

LISA14: Culture

"You Code Like a Sysadmin"—Software Development for the Non-Developer

11:00 am-11:45 am
Invited Talk

H. Wade Minter, Adwerx

H. Wade Minter is the Chief Technology Officer at TeamSnap, a company that makes life easier for people who participate in youth and adult recreational sports. He is also the ring announcer for a professional wrestling federation. The two roles may or may not be related.

The software development community is filled with brilliant, talented people who thrive on the latest programming methodologies, breathe agile, live scrum, and scoff at anything with less than 100% test coverage. But what if you're just a nerd with an idea and a little programming knowledge? Is there a place for you?

In this talk, you'll learn how a sysadmin turned into a developer and helped build a multimillion-dollar company, while doing everything wrong. Feel free to share your stories, and we'll move beyond the stigma of coding like a sysadmin!

The software development community is filled with brilliant, talented people who thrive on the latest programming methodologies, breathe agile, live scrum, and scoff at anything with less than 100% test coverage. But what if you're just a nerd with an idea and a little programming knowledge? Is there a place for you?

In this talk, you'll learn how a sysadmin turned into a developer and helped build a multimillion-dollar company, while doing everything wrong. Feel free to share your stories, and we'll move beyond the stigma of coding like a sysadmin!

Available Media

  • Read more about "You Code Like a Sysadmin"—Software Development for the Non-Developer

Best Practices for When s*IT Hits the Fan

11:45 am-12:30 pm
Invited Talk

Dave Cliffe, PagerDuty

Dave is an engineer who has adopted a more peaceful role as "sherpa" on the Product team at PagerDuty, a company whose sole goal is to make the lives of DevOps engineers everywhere a calmer, sanity-filled reality. Before PagerDuty, Dave worked in cloud computing at Microsoft on the Windows Azure team. Frequently, he wonders which is scarier: being an on-call engineer responsible for an outage or being a parent. The debate rages on.

Outages suck; how you handle them shouldn’t. At PagerDuty, we talk to real customers experiencing real outages all the time. Operations escalations and downtime can be handled in many ways:

  • During the incident: who to alert when, how to communicate, handling dependency and downstream failures, disclosure
  • After the incident: post-mortems, public disclosure, formalizing process vs. investing in automation, preventative actions

There are also ways to keep engineers sane, customers happy, and the $$$ flowing. In this talk, come learn about best practices from across the industry, including how PagerDuty executes during an outage (but trust us, those never happen).

Outages suck; how you handle them shouldn’t. At PagerDuty, we talk to real customers experiencing real outages all the time. Operations escalations and downtime can be handled in many ways:

  • During the incident: who to alert when, how to communicate, handling dependency and downstream failures, disclosure
  • After the incident: post-mortems, public disclosure, formalizing process vs. investing in automation, preventative actions

There are also ways to keep engineers sane, customers happy, and the $$$ flowing. In this talk, come learn about best practices from across the industry, including how PagerDuty executes during an outage (but trust us, those never happen).

Available Media

  • Read more about Best Practices for When s*IT Hits the Fan

W-Talks 2b

LISA14: Metrics

Rethinking Metrics: Metrics 2.0

11:00 am-11:45 am
Invited Talk

Dieter Plaetinck, Vimeo

Dieter Plaetinck is a Belgian engineer living in NYC. Realizing existing metrics solutions didn't cut it while working on complicated backends and infrastructure at Vimeo, he set out to make the experience smoother via projects like the Graph-Explorer metrics dashboard and statsdaemon, a metrics aggregator. He also works on a bunch of other open source projects like anthracite, an event logging app, and uzbl, a minimalist web browser for control freaks. He maintains the metrics 2.0 spec (metrics20.org) and runs a website dedicated to proper use of standardized units and prefixes. He tweets and blogs (sometimes).

@Dieter_be

As the amount of metrics, software that produce and process them, and people involved in them continue to increase, we need better ways to organize them, to make them self-describing, and do so in a way that is consistent. Leveraging this, we can then automatically build graphs and dashboards, given a query that represents an information need, even for complicated cases. We can build richer visualizations, alerting and fault detection. This talk will introduce the concepts and related tools, demonstrate possibilities using the Graph-Explorer interface, and lay the groundwork for future work.

As the amount of metrics, software that produce and process them, and people involved in them continue to increase, we need better ways to organize them, to make them self-describing, and do so in a way that is consistent. Leveraging this, we can then automatically build graphs and dashboards, given a query that represents an information need, even for complicated cases. We can build richer visualizations, alerting and fault detection. This talk will introduce the concepts and related tools, demonstrate possibilities using the Graph-Explorer interface, and lay the groundwork for future work.

Available Media

  • Read more about Rethinking Metrics: Metrics 2.0

HPC Resource Accounting: Progress Against Allocation—Lessons Learned

11:45 am-12:30 pm
Invited Talk

Ken Schumacher, Fermi National Accelerator Laboratory

Ken Schumacher is a 35 year computing professional who has spent the last 17 years at Fermi National Accelerator Lab. He currently helps support several HPC compute clusters, monitoring and reporting resource usage against allocations. Previously he worked with teams supporting lab wide Unix systems, farm and grid systems as well as data storage systems (currently over 300 PB of tape).

The stakeholders who fund and oversee our HPC facilities need to know how our resources are utilized. Based on usage, we manage priorities and quotas so all users can get their fair share. Each year we allocate normalized cpu-core hours across our clusters. I will describe our usage reporting including how to incorporate additional ""charges"" for on-line storage, off-line storage and dedicated processors. You will learn how we deal with credits for failed jobs and periods of reduced performance due to load shed events. And I'll describe the usefulness of calculating burn rates for reprioritizing batch queues.

The stakeholders who fund and oversee our HPC facilities need to know how our resources are utilized. Based on usage, we manage priorities and quotas so all users can get their fair share. Each year we allocate normalized cpu-core hours across our clusters. I will describe our usage reporting including how to incorporate additional ""charges"" for on-line storage, off-line storage and dedicated processors. You will learn how we deal with credits for failed jobs and periods of reduced performance due to load shed events. And I'll describe the usefulness of calculating burn rates for reprioritizing batch queues.

Available Media

  • Read more about HPC Resource Accounting: Progress Against Allocation—Lessons Learned

W-Talks 3b

LISA14: Syseng

The Stack Exchange Infrastructure: How We Scale Our Windows Based Stack at the World's 50th Largest Website Network

11:00 am-11:45 am
Invited Talk

George Beech, Stack Exchange, Inc.

George is an SRE generalist at Stack Exchange. He has worked on every part of the stack from Windows, to Linux, to the network infrastructure.

His experience working in the IT field over the past thirteen years has led him to love working with multiple technologies, and allowed him to experience everything from running a small network as a consultant to being part of a large team running very large scale infrastructure.

We are nuts about performance, and very proud of what we are able to do with a Microsoft based stack at scale.

This talk will go over the Architecture that is used at Stack Exchange to serve millions of users on a WISC stack. I will go over the general architecture, our tooling, and what we use to make the site run fast.

We are nuts about performance, and very proud of what we are able to do with a Microsoft based stack at scale.

This talk will go over the Architecture that is used at Stack Exchange to serve millions of users on a WISC stack. I will go over the general architecture, our tooling, and what we use to make the site run fast.

Available Media

  • Read more about The Stack Exchange Infrastructure: How We Scale Our Windows Based Stack at the World's 50th Largest Website Network

Radical Ideas from the Practice of Cloud Computing

11:45 am-12:30 pm
Invited Talk

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

Tom will highlight some of the most radical ideas from the new book “The Practice of Cloud System Administration”. Topics will include: Most people use load balancers wrong; you should randomly power off machines; cloud computing will eventually be so inexpensive you won’t be able to justify running your own hardware, the most highly reliable systems are built on cheap hardware that breaks a lot, and sysadmins should never say no to installing new releases from developers. And many more!

Tom will highlight some of the most radical ideas from the new book “The Practice of Cloud System Administration”. Topics will include: Most people use load balancers wrong; you should randomly power off machines; cloud computing will eventually be so inexpensive you won’t be able to justify running your own hardware, the most highly reliable systems are built on cheap hardware that breaks a lot, and sysadmins should never say no to installing new releases from developers. And many more!

Available Media

  • Read more about Radical Ideas from the Practice of Cloud Computing

W-Mini Tutorials 1b

LISA14: Dev-Ops

Introduction to Docker

11:00 am-12:30 pm
Mini Tutorial

Jérôme Petazzoni, and Nathan LeClaire, Docker

Jerome is a senior engineer at Docker, where he rotates between Ops, Support and Evangelist duties. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script. His left hand cares for the dotCloud PAAS servers, while his right hand builds cool hacks around Docker.

  • Read more about Introduction to Docker

W-Mini Tutorials 2b

LISA14: Syseng

Examining System Crashes and Hangs

11:00 am-12:30 pm
Mini Tutorial

Max Bruning, Joyent

Max Bruning began using and programming Unix-based systems while obtaining a Master's degree at Columbia University in the late 1970's. He has spent many years doing kernel development, as well as teaching Unix courses at various companies. He has done consulting and/or training work for Bell Labs, AT&T, Motorola, Sun Microsystems, HP, Siemens-Nixdorf, and various other companies. In September 2010, he started porting Linux KVM to SmartOS for Joyent. He is currently the Training Director at Joyent.

  • Read more about Examining System Crashes and Hangs

LISA Lab Session 1

LISA Lab Office Hours

11:00 am-12:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

Team Communications
H. Wade Minter, Adwerx

AI Planner for System Configuration Orchestration
Herry Herry, University of Edinburgh

  • Read more about LISA Lab Office Hours

12:30 pm–2:00 pm Conference Lunch on the Expo Floor Wednesday

2:00 pm–3:30 pm Wednesday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Cedar AB Willow A

W-Talks 1c

LISA14: Security

Building a One-Time-Password Token Authentication Infrastructure

2:00 pm-2:45 pm
Invited Talk

Jonathan Hanks, LIGO Lab/California Institute of Technology, and Abe Singer, Laser Interferometer Gravitational Wave Observatory, Caltech

Jonathan Hanks is currently a software engineer at the Laser Interferometer Gravitational Wave Observatory (LIGO) in Hanford, WA. Previously he worked as a system administrator for five years with LIGO, which included system administration duties, several development projects, and supporting the LIGO Identity and Access Management Infrastructure. Prior to LIGO he worked in mobile devices.

Abe Singer is the Chief Security Officer for the Laser Interferometer Gravitational Wave Observatory and the LIGO Scientific Collaboration, and formerly the Chief Security Officer of the San Diego Supercomputer Center. At times he has been a programmer, system administrator, security geek, consultant, and expert witness. He is based at the California Institute of Technology in Pasadena.

One-time-passwords provide more security than passwords, but what about risks, multiple sites, provisioning and distribution? At LIGO, our infrastructure supports using a single token across multiple sites, and tolerates network failures while minimizing the overhead in managing and distributing tokens. We don’t have to to trust third party services, use “black box” software, or custom client software. We also support OTP for Kerberos without any client-side modifications. In this talk: our approach to evaluating and deploying token authentication includes risks, requirements, system architecture, supporting multiple sites, fault tolerance, Kerberos, and our experiences using it for a couple of years.

One-time-passwords provide more security than passwords, but what about risks, multiple sites, provisioning and distribution? At LIGO, our infrastructure supports using a single token across multiple sites, and tolerates network failures while minimizing the overhead in managing and distributing tokens. We don’t have to to trust third party services, use “black box” software, or custom client software. We also support OTP for Kerberos without any client-side modifications. In this talk: our approach to evaluating and deploying token authentication includes risks, requirements, system architecture, supporting multiple sites, fault tolerance, Kerberos, and our experiences using it for a couple of years.

Available Media

  • Read more about Building a One-Time-Password Token Authentication Infrastructure

JEA—A PowerShell Toolkit to Secure a Post-Snowden World

2:45 pm-3:30 pm
Invited Talk

Jeffrey P. Snover, Microsoft

Jeffrey Snover is a Distinguished Engineer and Lead Architect for the Windows Server & System Center Division, and is the inventor of Windows PowerShell, an object-based distributed automation engine, scripting language, and command line shell.

When asked what to do about corporate hacking, Ex-NSA Director Michael Hayden replied, "Man up and defend yourselves." Within a few years, Edward Snowden rocked the world by disclosing information he had gathered using his NSA administrative privileges. JEA stands for Just Enough Admin. It is a PowerShell DSC toolkit that you can use to "man up and defend yourselves" by allowing admins to perform functions without giving them admin privileges across a large set of systems.

When asked what to do about corporate hacking, Ex-NSA Director Michael Hayden replied, "Man up and defend yourselves." Within a few years, Edward Snowden rocked the world by disclosing information he had gathered using his NSA administrative privileges. JEA stands for Just Enough Admin. It is a PowerShell DSC toolkit that you can use to "man up and defend yourselves" by allowing admins to perform functions without giving them admin privileges across a large set of systems.

Available Media

  • Read more about JEA—A PowerShell Toolkit to Secure a Post-Snowden World

Paper Session: Head in the Clouds

HotRestore: A Fast Restore System for Virtual Machine Cluster

2:00 pm-2:15 pm
Refereed Paper

Lei Cui, Jianxin Li, Tianyu Wo, Bo Li, Renyu Yang, Yingjie Cao, and Jinpeng Huai, Beihang University

A common way for virtual machine cluster (VMC) to tolerate failures is to create distributed snapshot and then restore from the snapshot upon failure. However, restoring the whole VMC suffers from long restore latency due to large snapshot files. Besides, different latencies would lead to discrepancies in start time among the virtual machines. The prior started virtual machine (VM) thus cannot communicate with the VM that is still restoring, consequently leading to the TCP backoff problem.

In this paper, we present a novel restore approach called HotRestore, which restores the VMC rapidly without compromising performance. Firstly, HotRestore restores a single VM through an elastic working set which prefetches the working set in a scalable window size, thereby reducing the restore latency. Second, HotRestore constructs the communication-induced restore dependency graph, and then schedules the restore line to mitigate the TCP backoff problem. Lastly, a restore protocol is proposed to minimize the backoff duration. In addition, a prototype has been implemented on QEMU/ KVM. The experimental results demonstrate that HotRestore can restore the VMC within a few seconds whilst reducing the TCP backoff duration to merely dozens of milliseconds.

Available Media

Compiling Abstract Specifications into Concrete Systems—Bringing Order to the Cloud

2:15 pm-2:30 pm
Refereed Paper

Ian Unruh, Alexandru G. Bardas, Rui Zhuang, Xinming Ou, and Scott A. DeLoach, Kansas State University

Currently, there are important limitations in the abstractions that support creating and managing services in a cloud-based IT system. As a result, cloud users must choose between managing the low-level details of their cloud services directly (as in IaaS), which is time-consuming and error-prone, and turning over significant parts of this management to their cloud provider (in SaaS or PaaS), which is less flexible and more difficult to tailor to user needs. To alleviate this situation we propose a high-level abstraction called the requirement model for defining cloud-based IT systems. It captures important aspects of a system’s structure, such as service dependencies, without introducing low-level details such as operating systems or application configurations. The requirement model separates the cloud customer’s concern of what the system does, from the system engineer’s concern of how to implement it. In addition, we present a “compilation” process that automatically translates a requirement model into a concrete system based on pre-defined and reusable knowledge units. When combined, the requirement model and the compilation process enable repeatable deployment of cloud-based systems, more reliable system management, and the ability to implement the same requirement in different ways and on multiple cloud platforms. We demonstrate the practicality of this approach in the ANCOR (Automated eNterprise network COmpileR) framework, which generates concrete, cloud-based systems based on a specific requirement model. Our current implementation targets OpenStack and uses Puppet to configure the cloud instances, although the framework will also support other cloud platforms and configuration management solutions.

Available Media

An Administrator’s Guide to Internet Password Research

2:30 pm-2:45 pm
Refereed Paper

Dinei Florêncio and Cormac Herley, Microsoft Research; Paul C. van Oorschot, Carleton University

The research literature on passwords is rich but little of it directly aids those charged with securing web-facing services or setting policies. With a view to improving this situation we examine questions of implementation choices, policy and administration using a combination of literature survey and first-principles reasoning to identify what works, what does not work, and what remains unknown. Some of our results are surprising. We find that offline attacks, the justification for great demands of user effort, occur in much more limited circumstances than is generally believed (and in only a minority of recently-reported breaches). We find that an enormous gap exists between the effort needed to withstand online and offline attacks, with probable safety occurring when a password can survive 106 and 1014 guesses respectively. In this gap, eight orders of magnitude wide, there is little return on user effort: exceeding the online threshold but falling short of the offline one represents wasted effort. We find that guessing resistance above the online threshold is also wasted at sites that store passwords in plaintext or reversibly encrypted: there is no attack scenario where the extra effort protects the account.

Available Media

Paper Session: Who Watches?

Analyzing Log Analysis: An Empirical Study of User Log Mining

2:45 pm-3:00 pm
Refereed Paper

S. Alspaugh, University of California, Berkeley and Splunk Inc.; Beidi Chen and Jessica Lin, University of California, Berkeley; Archana Ganapathi, Splunk Inc.; Marti A. Hearst and Randy Katz, University of California, Berkeley

Awarded Best Student Paper! 

We present an in-depth study of over 200K log analysis queries from Splunk, a platform for data analytics. Using these queries, we quantitatively describe log analysis behavior to inform the design of analysis tools. This study includes state machine based descriptions of typical log analysis pipelines, cluster analysis of the most common transformation types, and survey data about Splunk user roles, use cases, and skill sets. We find that log analysis primarily involves filtering, reformatting, and summarizing data and that non-technical users increasingly need data from logs to drive their decision making. We conclude with a number of suggestions for future research.

Available Media

Realtime High-Speed Network Traffic Monitoring Using ntopng

3:00 pm-3:15 pm
Refereed Paper

Luca Deri, IIT/CNR and ntop; Maurizio Martinelli, IIT/CNR; Alfredo Cardigliano, ntop

Awarded Best Paper!

Luca Deri is the leader of the ntop project aimed at developing an open-source monitoring platform. He previously worked for University College of London and IBM Research, prior receiving his PhD at the University of Berne. When not working at ntop, he shares his time between the .it Internet Domain Registry (nic.it) and the University of Pisa where he has been appointed as lecturer at the CS department.

Monitoring network traffic has become increasingly challenging in terms of number of hosts, protocol proliferation and probe placement topologies. Virtualised environments and cloud services shifted the focus from dedicated hardware monitoring devices to virtual machine based, software traffic monitoring applications. This paper covers the design and implementation of ntopng, an open-source traffic monitoring application designed for high-speed networks. ntopng’s key features are large networks real-time analytics and the ability to characterise application protocols and user traffic behaviour. ntopng was extensively validated in various monitoring environments ranging from small networks to .it ccTLD traffic analysis.

Available Media

Towards Detecting Target Link Flooding Attack

3:15 pm-3:30 pm
Refereed Paper

Lei Xue, The Hong Kong Polytechnic University; Xiapu Luo, The Hong Kong Polytechnic University Shenzen Research Institute; Edmond W. W. Chan and Xian Zhan, The Hong Kong Polytechnic University

A new class of target link flooding attacks (LFA) can cut off the Internet connections of a target area without being detected because they employ legitimate flows to congest selected links. Although new mechanisms for defending against LFA have been proposed, the deployment issues limit their usages since they require modifying routers. In this paper, we propose LinkScope, a novel system that employs both the end-to-end and the hop-by-hop network measurement techniques to capture abnormal path performance degradation for detecting LFA and then correlate the performance data and traceroute data to infer the target links or areas. Although the idea is simple, we tackle a number of challenging issues, such as conducting large-scale Internet measurement through noncooperative measurement, assessing the performance on asymmetric Internet paths, and detecting LFA. We have implemented LinkScope with 7174 lines of C codes and the extensive evaluation in a testbed and the Internet show that LinkScope can quickly detect LFA with high accuracy and low false positive rate.

Available Media

W-Talks 3c

LISA14: Dev-Ops

Making "Push On Green" a Reality: Issues & Actions Involved in Maintaining a Production Service

2:00 pm-2:45 pm
Invited Talk

Daniel V. Klein, Google, Inc.

Daniel Klein is a Site Reliability Engineer at Google Pittsburgh, where his job is to look for trouble before it happens. When he's not doing that, he gives talks, teaches engineering and soft-topics classes, mentors Nooglers, and makes trouble. Occasionally, he sleeps.

Daniel has written a LISA booklet on Monitoring, and has given dozens of invited talks around the world. He promises that you'll enjoy this talk *and* learn some useful stuff, too!

Maintaining a production system is complicated. Systems may consist of many components with separate teams responsible for each one (with multiple development, testing, quality assurance, site reliability, and other engineering teams for each component, each with their own hierarchies, rules, and procedures). Having a system to enforce procedures is a good start, but it is far better to have an automated system to actually perform the updates in a safe and controlled manner. We call this process “Push On Green”. We will discuss some of the many factors that have been (or are actively being) addressed in keeping some of our production systems not only up-and-running, but also updated with as little engineer-involvement and user-visible downtime as possible. In the process, we'll show how you can do this in your environment, too.

Maintaining a production system is complicated. Systems may consist of many components with separate teams responsible for each one (with multiple development, testing, quality assurance, site reliability, and other engineering teams for each component, each with their own hierarchies, rules, and procedures). Having a system to enforce procedures is a good start, but it is far better to have an automated system to actually perform the updates in a safe and controlled manner. We call this process “Push On Green”. We will discuss some of the many factors that have been (or are actively being) addressed in keeping some of our production systems not only up-and-running, but also updated with as little engineer-involvement and user-visible downtime as possible. In the process, we'll show how you can do this in your environment, too.

Available Media

  • Read more about Making "Push On Green" a Reality: Issues & Actions Involved in Maintaining a Production Service

Distributing Software in a Massively Parallel Environment

2:45 pm-3:30 pm
Invited Talk

Dinah McNutt, Google, Inc.

This talk describes how Google is using its proprietary package manager to distribute software across its server farm. I’ll describe some of the design decisions and features that allow Google to do things traditional package managers cannot in order to ensure consistency and achieve high performance. Hopefully this talk will inspire attendees to think of creative ways to do packaging and leverage features of popular package managers.

This talk describes how Google is using its proprietary package manager to distribute software across its server farm. I’ll describe some of the design decisions and features that allow Google to do things traditional package managers cannot in order to ensure consistency and achieve high performance. Hopefully this talk will inspire attendees to think of creative ways to do packaging and leverage features of popular package managers.

Available Media

  • Read more about Distributing Software in a Massively Parallel Environment

W-Mini Tutorials 1c

LISA14: Metrics

while (true) do; How hard can it be to keep running?

2:00 pm-3:30 pm
Mini Tutorial

Caskey L. Dickson, Google, Inc.

Caskey Dickson is a Site Reliability Engineer/Software Engineer at Google, where he works on writing and maintaining monitoring services that operate at "Google scale." Before coming to Google he was a senior developer at Symantec, wrote software for various internet startups such as CitySearch and CarsDirect, ran a consulting company, and taught undergraduate and graduate computer science at Loyola Marymount University. He has an undergraduate degree in Computer Science, a Masters in Systems Engineering, and an MBA from Loyola Marymount.

  • Read more about while (true) do; How hard can it be to keep running?

W-Mini Tutorials 2c

LISA14: Syseng

Solving Problems and Identifying Bottlenecks with strace and truss

2:00 pm-3:30 pm
Mini Tutorial

Doug Hughes, D. E. Shaw Research, LLC

Doug Hughes graduated from Penn State University with a BE in Computer Engineering in 1991. He has worked for GE Aerospace at the network operations center, worked six years at Auburn University College of Engineering managing the infrastructure for the college of engineering, and spent six years at Global Crossing supporting the global IP infrastructure. Currently he works at D. E. Shaw Research, LLC where he leads a multi-national team of seven System Administrators covering all aspects of data, networking, and clustering infrastructure.

  • Read more about Solving Problems and Identifying Bottlenecks with strace and truss

LISA Lab Session 2

LISA Lab Office Hours

2:00 pm-3:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

Scaling the Stack Exchange
George Beech, Stack Exchange, Inc.

metrics 2.0
Dieter Plaetinck, Vimeo

  • Read more about LISA Lab Office Hours

3:30 pm–4:00 pm Break with Refreshments on the Expo Floor Wednesday

4:00 pm–5:30 pm Wednesday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Cedar AB Willow A

W-Talks 1d

LISA14: Culture

LISA Build: Mind. Blown.

4:00 pm-4:45 pm
Invited Talk

Branson Matheson, Blackphone, and Brett Thorson, Cranial Thunder Solutions

You may not have known it, but you were a (small) part of LISA Build this year. When you walked into the Grand Ballrooms on Tuesday morning, you saw two new access points (Usenix-5Ghz and Usenix 2Ghz). Behind those access points was a team of people who arrived onsite early to build a network from scratch using interesting gear, a lot of technical glue, and plenty of experience. Come to this talk to ask the team questions; hear about what we learned about while building, running, and destroying a network in seven days; and how you can be a part of LISA Build 2015.

You may not have known it, but you were a (small) part of LISA Build this year. When you walked into the Grand Ballrooms on Tuesday morning, you saw two new access points (Usenix-5Ghz and Usenix 2Ghz). Behind those access points was a team of people who arrived onsite early to build a network from scratch using interesting gear, a lot of technical glue, and plenty of experience. Come to this talk to ask the team questions; hear about what we learned about while building, running, and destroying a network in seven days; and how you can be a part of LISA Build 2015.

Available Media

  • Read more about LISA Build: Mind. Blown.

Burnout and Ops

4:45 pm-5:30 pm
Invited Talk

Lars Lehtonen, opsangeles.com

Lars Lehtonen consults as an engineer for the smallest and largest companies in Los Angeles. He is one year shy of having 20 years of Linux experience.

Infrastructure engineering is a craft learned outside of classrooms. The discipline is ever-changing. Our value is not in credentials or the recall of accumulated facts, but instead by our capacity to tackle the unknown.

Failures of management, product, development, and QA hit us first, usually in the dead of night. Established industries have begrudigingly accepted the need to pay for 24/7 staffing, but our teams are so small that we can find ourselves permanently on-call. Some organizations delay hiring ops talent for so long that it is impossible for the new hire to improve the infrastructure. Instead the engineer is sacrificed to an all-hours cycle of quick-fixes and looming crises.

Infrastructure engineering is a craft learned outside of classrooms. The discipline is ever-changing. Our value is not in credentials or the recall of accumulated facts, but instead by our capacity to tackle the unknown.

Failures of management, product, development, and QA hit us first, usually in the dead of night. Established industries have begrudigingly accepted the need to pay for 24/7 staffing, but our teams are so small that we can find ourselves permanently on-call. Some organizations delay hiring ops talent for so long that it is impossible for the new hire to improve the infrastructure. Instead the engineer is sacrificed to an all-hours cycle of quick-fixes and looming crises.

The first bout of burnout is inevitable. How are we to know our limits until we run in to them? Burnout, sufficiently advanced, is permanent damage. I've recovered from bad situations in both startups and a huge corporation. I am going to share some war-stories and describe the fixes that I implemented to protect my long-term livelihood.

Available Media

  • Read more about Burnout and Ops

W-Talks 2d

LISA14: Metrics

Feature Flagging at Scale

4:00 pm-4:45 pm
Invited Talk

David Josephsen, librato.com

As the developer evangelist for Librato, Dave Josephsen hacks on tools, writes about statistics, systems monitoring, alerting, metrics collection and visualization, and generally does anything he can to help engineers and developers close the feedback loop in their systems. He writes the "iVoyer" column on systems monitoring in ;login: magazine.

Librato runs a distributed time-series data storage, analysis, and alerting platform as a service. In this talk we describe our internal feature-flagging system, which combines Rollout[1], ZooKeeper[2] and our in-house campfire chatbot (twke[3]) to transparently enable features for targeted production end-users without disrupting other customers.

Librato runs a distributed time-series data storage, analysis, and alerting platform as a service. In this talk we describe our internal feature-flagging system, which combines Rollout[1], ZooKeeper[2] and our in-house campfire chatbot (twke[3]) to transparently enable features for targeted production end-users without disrupting other customers.

Available Media

  • Read more about Feature Flagging at Scale

Monitoring: The Math Behind Bad Behavior

4:45 pm-5:30 pm
Invited Talk

Theo Schlossnagle, Circonus

A widely respected industry thought leader, Theo is the author of Scalable Internet Architectures (Sams) and a frequent speaker at worldwide IT conferences. Theo is a computer scientist in every respect. Theo is a member of the IEEE and a senior member of the ACM. He serves on the editoral board of the ACM’s Queue Magazine.

Theo resides in Maryland with his wife and three daughters. When speaking about his work, he remarks, “I like tackling hard problems and playing with big toys."

As we monitor more and more systems we can quickly become overwhelmed with data. Large systems today can generate many millions of measurements per second across millions of separate points of instrumentation. We’ve long surpassed human capacity of understanding the whole picture. Even with clever visualizations, the breadth of data is simply too much to reason about.

As our data feeds overflow our mental capabilities, we’re forced down one of two paths: explicitly collect less or implicitly surface interesting data. The latter is hard and requires a fair bit of math. In this presentation, I’ll talk about how we approach these numerical analysis problems and what you should and should not be able to expect.

As we monitor more and more systems we can quickly become overwhelmed with data. Large systems today can generate many millions of measurements per second across millions of separate points of instrumentation. We’ve long surpassed human capacity of understanding the whole picture. Even with clever visualizations, the breadth of data is simply too much to reason about.

As our data feeds overflow our mental capabilities, we’re forced down one of two paths: explicitly collect less or implicitly surface interesting data. The latter is hard and requires a fair bit of math. In this presentation, I’ll talk about how we approach these numerical analysis problems and what you should and should not be able to expect.

Available Media

  • Read more about Monitoring: The Math Behind Bad Behavior

W-Talks 3d

LISA14: Syseng

Finding a Good Home: Choosing the Right Data Store

4:00 pm-4:45 pm
Invited Talk

Jeff Darcy, Red Hat

Jeff Darcy has been working on distributed storage since 1989, when that meant DECnet and NFSv2. Since then he has played a significant role in the development of clustered file systems, continuous data protection, and other areas. He is currently a developer at Red Hat, with the rare opportunity to work on two open-source distributed file systems - GlusterFS and Ceph - at once.

Modern high-scale storage systems embody a bewildering variety of interfaces, features, and performance profiles. Starting with an overview of key factors that should drive purchase and design decisions, we'll compare block stores, object stores, file systems, and even databases. Evolutionary trends and emerging technologies will also be highlighted, ranging from new physical media to erasure coding.

Modern high-scale storage systems embody a bewildering variety of interfaces, features, and performance profiles. Starting with an overview of key factors that should drive purchase and design decisions, we'll compare block stores, object stores, file systems, and even databases. Evolutionary trends and emerging technologies will also be highlighted, ranging from new physical media to erasure coding.

Available Media

  • Read more about Finding a Good Home: Choosing the Right Data Store

Five Pitfalls for Benchmarking Big Data Systems

4:45 pm-5:30 pm
Invited Talk

Yanpei Chen and Gwen Shapira, Cloudera, Inc.

Yanpei Chen is a member of the Performance Engineering Team at Cloudera, where he works on internal and competitive performance measurement and optimization. His work touches upon multiple interconnected computation frameworks, including Cloudera Search, Cloudera Impala, Apache Hadoop, Apache HBase, and Apache Hive. He is the lead author of the Statistical Workload Injector for MapReduce (SWIM), an open source tool that allows someone to synthesize and replay MapReduce production workloads. SWIM has become a standard MapReduce performance measurement tool used to certify many Cloudera partners. He received his doctorate at the UC Berkeley AMP Lab, where he worked on performance-driven, large-scale system design and evaluation.

Gwen Shapira is a Solutions Architect at Cloudera. She has 15 years of experience working with customers to design scalable data architectures. Working as a data warehouse DBA, ETL developer and a senior consultant. She specializes in migrating data warehouses to Hadoop, integrating Hadoop with relational databases, building scalable data processing pipelines, and scaling complex data analysis algorithms.

Performance is an increasingly important attribute of Big Data systems as focus shifts from batch processing to real-time analysis and to consolidated multi-tenant systems. One of the little-understood challenges in scaling data systems is properly defining and measuring performance. The complexity, diversity, and scale of big data systems make this a difficult task and we frequently encounter haphazard benchmarks that lead to bad technology choices, poor purchasing decisions, and suboptimal cluster operations. This talk draws on performance engineering and field services experience from a leading Big Data vendor. We will talk about the most common performance benchmarking pitfalls and share practical advice on how to avoid them with rigorous metrics and measurement methods.

Performance is an increasingly important attribute of Big Data systems as focus shifts from batch processing to real-time analysis and to consolidated multi-tenant systems. One of the little-understood challenges in scaling data systems is properly defining and measuring performance. The complexity, diversity, and scale of big data systems make this a difficult task and we frequently encounter haphazard benchmarks that lead to bad technology choices, poor purchasing decisions, and suboptimal cluster operations. This talk draws on performance engineering and field services experience from a leading Big Data vendor. We will talk about the most common performance benchmarking pitfalls and share practical advice on how to avoid them with rigorous metrics and measurement methods.

Available Media

  • Read more about Five Pitfalls for Benchmarking Big Data Systems

W-Mini Tutorials 1d

LISA14: Culture

Handling the Interruptive Nature of Operations: A World of Squirrels and Shiny Objects

4:00 pm-5:30 pm
Mini Tutorial

Avleen Vig, Etsy, Inc., and Carolyn Rowland, National Institute of Standards and Technology (NIST)

Avleen is a Staff Operations Engineer at Etsy, where he spends much of his time growing the infrastructure for selling knitted gloves and cross-stitched periodic tables. Before joining Etsy he worked at several large tech companies, including EarthLink and Google, as well as a number of small successful startups.

Carolyn Rowland began her UNIX system administration career in 1991 and currently leads an ops/dev team at the National Institute of Standards and Technology (NIST). She credits her success with being able to be the bridge between senior management and technology. Her team has distinguished itself as a leader in the development of new technology solutions that solve business and research problems within the Engineering Laboratory and across the NIST campus.

  • Read more about Handling the Interruptive Nature of Operations: A World of Squirrels and Shiny Objects

W-Mini Tutorials 2d

LISA14: Syseng

Advanced Configuration Management with CFEngine 3.6 and the Design Center

4:00 pm-5:30 pm
Mini Tutorial

Ted Zlatanov, CFEngine

Ted Zlatanov has been working with CFEngine as a programmer and sysadmin since 1999 and holds an MS degree from Boston University.

  • Read more about Advanced Configuration Management with CFEngine 3.6 and the Design Center

LISA Lab Session 3

LISA Lab Office Hours

4:00 pm-5:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

Moving Target Defense
Alexandru Bardas, Kansas State University

  • Read more about LISA Lab Office Hours

6:30 pm–7:30 pm Poster Session in the Second Floor Foyer Wednesday

 

Thursday, November 13, 2014

8:30 am–9:00 am Continental Breakfast Second Floor Foyer

9:00 am–10:30 am Thursday

T-Keynote Address

LISA14: Dev-Ops

DevOps Patterns Distilled: A Fifteen Year Study of High Performing IT Organizations

9:00 am-10:30 am
Keynote Address

Gene Kim, Author & Researcher

Grand Ballroom ABC

Gene is a multiple award winning CTO, researcher and author. He was founder and CTO of Tripwire for 13 years. He has written three books, including “The Visible Ops Handbook” and “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win." Gene is a huge fan of IT operations, and how it can enable developers to maximize throughput of features from “code complete” to “in production”, without causing chaos and disruption to the IT environment. He has worked with some of the top Internet companies on improving deployment flow and increasing the rigor around IT operational processes. In 2007, ComputerWorld added Gene to the “40 Innovative IT People Under The Age Of 40” list, and was given the Outstanding Alumnus Award by the Department of Computer Sciences at Purdue University for achievement and leadership in the profession.

Organizations employing DevOps practices such as Google, Amazon, Facebook, Etsy and Twitter are routinely deploying code into production hundreds, or even thousands, of times per day, while providing world-class availability, reliability and security. In contrast, most organizations struggle to do releases more than every nine months.

The authors of the upcoming “DevOps Cookbook” have been studying high performing organizations since 1999, and we capture and codify how these high-performing organizations achieve this fast flow of work through Product Management and Development, through QA and Infosec, and into IT Operations. By doing so, other organizations can now replicate the extraordinary culture and outcomes enabling their organization to scale and win in the marketplace.

Organizations employing DevOps practices such as Google, Amazon, Facebook, Etsy and Twitter are routinely deploying code into production hundreds, or even thousands, of times per day, while providing world-class availability, reliability and security. In contrast, most organizations struggle to do releases more than every nine months.

The authors of the upcoming “DevOps Cookbook” have been studying high performing organizations since 1999, and we capture and codify how these high-performing organizations achieve this fast flow of work through Product Management and Development, through QA and Infosec, and into IT Operations. By doing so, other organizations can now replicate the extraordinary culture and outcomes enabling their organization to scale and win in the marketplace.

The goal of the DevOps Cookbook is to help accelerate DevOps adoption, increase the success of DevOps initiatives, and lower the activation energy required for DevOps transformations to start and finish.

Available Media

  • Read more about DevOps Patterns Distilled: A Fifteen Year Study of High Performing IT Organizations

10:30 am–11:00 am Break with Refreshments on the Expo Floor Thursday

11:00 am–12:30 pm Thursday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Cedar AB Willow A Redwood AB

T-Talks 1a

LISA14: Security

Open Source Identity Management in the Enterprise

11:00 am-12:30 pm
Invited Talk

Brian J. Atkisson, Red Hat

Brian J. Atkisson has 15 years of production systems engineering and operations experience, focusing primarily on identity management and virtualization solutions. He has worked in these roles for the University of California, Jet Propulsion Laboratory, and Red Hat, Inc. He is a Red Hat Certified Architect and Engineer, in addition to holding many other certifications and a B.S. in Microbiology. He currently is a Principal Systems Engineer on the Identity and Access Management team within Red Hat IT.

This talk will discuss how Red Hat IT utilizes and integrates open source solutions to offer a seamless experience for internal users. Specifically, we will cover how Red Hat incorporates SAML, Kerberos, LDAP, Two-Factor Authentication, PKI certificates, and how end-user systems are able to function in this multi-platform, fluid BYOD environment. Recent experiences will be shared on how Red Hat is scaling this identity management platform to utilize true single sign-on in cloud environments. Finally, best practices and future plans will be discussed as part of a Q&A session.

This talk will discuss how Red Hat IT utilizes and integrates open source solutions to offer a seamless experience for internal users. Specifically, we will cover how Red Hat incorporates SAML, Kerberos, LDAP, Two-Factor Authentication, PKI certificates, and how end-user systems are able to function in this multi-platform, fluid BYOD environment. Recent experiences will be shared on how Red Hat is scaling this identity management platform to utilize true single sign-on in cloud environments. Finally, best practices and future plans will be discussed as part of a Q&A session.

Available Media

  • Read more about Open Source Identity Management in the Enterprise

T-Talks 1b

LISA14: Culture

Remote Work Panel

11:00 am-12:30 pm
Panel

Moderator: Doug Hughes, D. E. Shaw Research, LLC.

Panelists: Mark Imbriaco, DigitalOcean; Bill Lincoln, Pythian; H. Wade Minter, Adwerx; Michael Rembetsy, Etsy

Doug Hughes graduated from Penn State University with a BE in Computer Engineering in 1991. He has worked for GE Aerospace at the network operations center, worked six years at Auburn University College of Engineering managing the infrastructure for the college of engineering, and spent six years at Global Crossing supporting the global IP infrastructure. Currently he works at D. E. Shaw Research, LLC where he leads a multi-national team of seven System Administrators covering all aspects of data, networking, and clustering infrastructure.

Mark Imbriaco has been in the Internet industry for over 20 years, working in roles running the gamut of software development and operations. He's worked in infrastructure for companies like America Online, LivingSocial, 37signals, Heroku, and GitHub. Currently, he serves as the VP of Technical Operations at DigitalOcean managing operations for their fast growing IaaS cloud.

Bill Lincoln is a Service Delivery Manager & Business Advocate at Pythian, a worldwide organization providing Managed Services and Project Consulting to companies whose data availability, reliability, and integrity are critical to their business. Pythian targets the top 5% of talent in the world, as a result 2/3 of our workforce works remote/from home.

H. Wade Minter is the Chief Technology Officer at TeamSnap, a company that makes life easier for people who participate in youth and adult recreational sports. He is also the ring announcer for a professional wrestling federation. The two roles may or may not be related.

Michael has worked in technical operations for more than 10 years in the web, healthcare, online media and financial industries. He started out in the help desk area, but moved to operations shortly after starting, and has been building and running data center and operations teams ever since. In previous jobs he worked for NBC Universal, iVillage and McDonalds online game, Monopoly. Currently, Michael is the VP, Technical Operations for Etsy.

 

This panel will focus on how companies big and small (check) handle the remote workers in Ops roles.

 

Doug Hughes graduated from Penn State University with a BE in Computer Engineering in 1991. He has worked for GE Aerospace at the network operations center, worked six years at Auburn University College of Engineering managing the infrastructure for the college of engineering, and spent six years at Global Crossing supporting the global IP infrastructure. Currently he works at D. E. Shaw Research, LLC where he leads a multi-national team of seven System Administrators covering all aspects of data, networking, and clustering infrastructure.

 

This panel will focus on how companies big and small (check) handle the remote workers in Ops roles.

 

Doug Hughes graduated from Penn State University with a BE in Computer Engineering in 1991. He has worked for GE Aerospace at the network operations center, worked six years at Auburn University College of Engineering managing the infrastructure for the college of engineering, and spent six years at Global Crossing supporting the global IP infrastructure. Currently he works at D. E. Shaw Research, LLC where he leads a multi-national team of seven System Administrators covering all aspects of data, networking, and clustering infrastructure.

Mark Imbriaco has been in the Internet industry for over 20 years, working in roles running the gamut of software development and operations. He's worked in infrastructure for companies like America Online, LivingSocial, 37signals, Heroku, and GitHub. Currently, he serves as the VP of Technical Operations at DigitalOcean managing operations for their fast growing IaaS cloud.

Bill Lincoln is a Service Delivery Manager & Business Advocate at Pythian, a worldwide organization providing Managed Services and Project Consulting to companies whose data availability, reliability, and integrity are critical to their business. Pythian targets the top 5% of talent in the world, as a result 2/3 of our workforce works remote/from home.

H. Wade Minter is the Chief Technology Officer at TeamSnap, a company that makes life easier for people who participate in youth and adult recreational sports. He is also the ring announcer for a professional wrestling federation. The two roles may or may not be related.

Michael Rembetsy has worked in technical operations for more than 10 years in the web, healthcare, online media and financial industries. He started out in the help desk area, but moved to operations shortly after starting, and has been building and running data center and operations teams ever since. In previous jobs he worked for NBC Universal, iVillage and McDonalds online game, Monopoly. Currently, Michael is the VP, Technical Operations for Etsy.

Available Media

  • Read more about Remote Work Panel

T-Talks 1c

LISA14: Syseng

Hardware Design for Cloud Scale Datacenters

11:00 am-11:45 am
Invited Talk

Kushagra Vaid, Microsoft

Kushagra Vaid is the General Manager for Server Engineering in Microsoft’s Cloud & Enterprise division. He is responsible for driving hardware R&D, engineering designs, deployments and support for Microsoft’s cloud scale services (such as Bing, Azure, Office 365, and others) across a global datacenter footprint.

Kushagra has published several papers in international research conferences, and is also the holder of over 25 patents in the field of computer architecture and datacenter design. He is a featured speaker in industry conferences on cloud services, hardware engineering and datacenter architecture.

Cloud computing is growing at an exponential pace with ever more applications being hosted in mega-scale public clouds such as Microsoft Azure. Designing and operating such large infrastructures requires not only significant investments in datacenters, servers, networking and operating systems, but also new paradigms for seamlessly integrating technologies and supply chains to drive higher efficiency and lower overall TCO. In this talk, we will present learnings from Microsoft’s vast experience in operating large scale cloud services on an installed base of 1M+ servers and how those learnings translate into architecture and operational principles for designing hardware infrastructure.

Cloud computing is growing at an exponential pace with ever more applications being hosted in mega-scale public clouds such as Microsoft Azure. Designing and operating such large infrastructures requires not only significant investments in datacenters, servers, networking and operating systems, but also new paradigms for seamlessly integrating technologies and supply chains to drive higher efficiency and lower overall TCO. In this talk, we will present learnings from Microsoft’s vast experience in operating large scale cloud services on an installed base of 1M+ servers and how those learnings translate into architecture and operational principles for designing hardware infrastructure.

Available Media

  • Read more about Hardware Design for Cloud Scale Datacenters

A New Age in Alerting with Bosun: The First Alerting IDE

11:45 am-12:30 pm
Invited Talk

Kyle Brandt, Stack Exchange, Inc.

Kyle Brandt is the co-author of the Bosun monitoring system and the Director of Site Reliability at Stack Exchange (the company behind Stack Overflow and Server Fault). He will talk to you about monitoring until he starts to lose his voice. He also enjoys spending time with his wife and pets (2 cats and a dog), video games, weight lifting, and road trips on his Harley.

At conferences we are told to "Be an Engineer!". Being an engineer with alerts means creating accurate and informative alerts so we can own and direct attention. The tools available to us fall short of empowering us to do this. That is why we created Bosun, a free and open source alerting IDE. Bosun has an expression language that decouples alerts from the metrics collected. It allows you to use methods including statistics, forecasts, boolean operations, and anomaly detection to define accurate alerts. You can then create rich and informative notifications. It also lets you experiment with how alerts would have behaved over history. Alerting is now an engineering discipline - come join me in exploring the implications of these new possibilities.

At conferences we are told to "Be an Engineer!". Being an engineer with alerts means creating accurate and informative alerts so we can own and direct attention. The tools available to us fall short of empowering us to do this. That is why we created Bosun, a free and open source alerting IDE. Bosun has an expression language that decouples alerts from the metrics collected. It allows you to use methods including statistics, forecasts, boolean operations, and anomaly detection to define accurate alerts. You can then create rich and informative notifications. It also lets you experiment with how alerts would have behaved over history. Alerting is now an engineering discipline - come join me in exploring the implications of these new possibilities.

Available Media

  • Read more about A New Age in Alerting with Bosun: The First Alerting IDE

T-Mini Tutorials 1b

LISA14: Syseng

Building PowerShell Commands

11:00 am-12:30 pm
Mini Tutorial

Steven Murawski, Chef

Steven is a Technical Community Manager for Chef and a Microsoft MVP in PowerShell. Steven is a co-host of the Ops All The Things podcast.

  • Read more about Building PowerShell Commands

T-Mini Tutorials 2b

LISA14: Syseng

Networking in the Cloud Age

11:00 am-12:30 pm
Mini Tutorial

David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. He's currently employed by Citrix Systems in the Open Source Business Office. David is the Vice President, Infrastructure at the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack and Apache jclouds. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

  • Read more about Networking in the Cloud Age

LISA Lab Session 4

LISA Lab Office Hours

11:00 am-12:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

State of Monitoring
Caskey Dickson, Google, Inc.

Too Much Data
Hal Stern, Merck & Co

  • Read more about LISA Lab Office Hours

T-Vendor Talks 1b

Vendor Talk: An Introduction to OpenStack Swift Object Storage

11:00 am-11:45 am
Vendor

Chris Nelson, SwiftStack

Sponsored by Silicon Mechanics and SwiftStack

Come learn about the OpenStack Swift! OpenStack Swift is a purpose-built object storage system built for scale, optimized for durability, high availability, and massive concurrency across the entire data set — it lets you create an Amazon S3-like public cloud in your own data center. We'll cover the overall architecture, and give you a short demo on how to deploy and build a cluster. Enough to get you up and running before day's end.

Come learn about the OpenStack Swift! OpenStack Swift is a purpose-built object storage system built for scale, optimized for durability, high availability, and massive concurrency across the entire data set — it lets you create an Amazon S3-like public cloud in your own data center. We'll cover the overall architecture, and give you a short demo on how to deploy and build a cluster. Enough to get you up and running before day's end.

  • Read more about Vendor Talk: An Introduction to OpenStack Swift Object Storage

Vendor Talk: Building the Modern Cloud-Enabled Data Center with Oracle Linux

11:45 am-12:30 pm
Vendor

Ken Crandall, Oracle Linux and OVM Principal Sales Consultant

In the rapidly changing IT landscape, systems engineering and operations professionals need effective tools to maximize investment and reduce complexity. With modern technologies such as Linux Containers, Docker support, rapid deployment with Oracle VM Templates, and support for OpenStack, Oracle Linux and Oracle VM support the needs of the modern data center. Join this session to learn from Oracle experts how to leverage these exciting new technologies today in your IT infrastructure.

In the rapidly changing IT landscape, systems engineering and operations professionals need effective tools to maximize investment and reduce complexity. With modern technologies such as Linux Containers, Docker support, rapid deployment with Oracle VM Templates, and support for OpenStack, Oracle Linux and Oracle VM support the needs of the modern data center. Join this session to learn from Oracle experts how to leverage these exciting new technologies today in your IT infrastructure.

  • Read more about Vendor Talk: Building the Modern Cloud-Enabled Data Center with Oracle Linux

12:30 pm–2:00 pm Conference Lunch on the Expo Floor Thursday

2:00 pm–3:30 pm Thursday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Cedar AB Willow A Redwood AB

T-Talks 2b

LISA14: Security

Penetration Testing in the Cloud

2:00 pm-2:45 pm
Invited Talk

Dan Lambright, Red Hat

Dan Lambright is a principal software engineer at Red Hat. By day he helps build the gluster distributed storage system, and by night he enjoys teaching Intrusion Detection as an adjunct professor at the University of Massachusetts at Lowell.

This talk discusses challenges associated with ensuring your infrastructure is secure in the cloud. Cloud providers are very careful with letting customers run penetration tests because they can be misunderstood for real attacks, but such tests are needed to confirm data is safe. This talk discusses the conditions and limits of permissions obtainable, and explores methods of doing targeted tests in ways that will not affect others using multi-tenant hardware. A promising approach is to have a docker instance play the role of the hacker, and use an instance's internal network interface to carry out attacks.

This talk discusses challenges associated with ensuring your infrastructure is secure in the cloud. Cloud providers are very careful with letting customers run penetration tests because they can be misunderstood for real attacks, but such tests are needed to confirm data is safe. This talk discusses the conditions and limits of permissions obtainable, and explores methods of doing targeted tests in ways that will not affect others using multi-tenant hardware. A promising approach is to have a docker instance play the role of the hacker, and use an instance's internal network interface to carry out attacks.

Available Media

  • Read more about Penetration Testing in the Cloud

PowerShell Desired State Configuration (DSC)

2:45 pm-3:30 pm
Invited Talk

Jeffrey P. Snover, Microsoft

Jeffrey Snover is a Distinguished Engineer and Lead Architect for the Windows Server & System Center Division, and is the inventor of Windows PowerShell, an object-based distributed automation engine, scripting language, and command line shell.

Configuration Management (CM) systems like Chef and Puppet hit a wall when it comes to managing Windows because of the core architectural differences between Unix and Windows. PowerShell DSC is a tools-agnostic standards-based platform which enables a wide range of CM tools to configure Windows, Linux and other standards-based devices.

Configuration Management (CM) systems like Chef and Puppet hit a wall when it comes to managing Windows because of the core architectural differences between Unix and Windows. PowerShell DSC is a tools-agnostic standards-based platform which enables a wide range of CM tools to configure Windows, Linux and other standards-based devices.

Available Media

  • Read more about PowerShell Desired State Configuration (DSC)

Paper Session: High Speed

Automatic and Dynamic Configuration of Data Compression for Web Servers

2:00 pm-2:15 pm
Refereed Paper

Eyal Zohar, Yahoo! Labs; Yuval Cassuto, Technion—Israel Institute of Technology

HTTP compression is an essential tool for web speed up and network cost reduction. Not surprisingly, it is used by over 95% of top websites, saving about 75% of webpage traffic.

The currently used compression format and tools were designed over 15 years ago, with static content in mind. Although the web has significantly evolved since and became highly dynamic, the compression solutions have not evolved accordingly. In the current most popular web-servers, compression effort is set as a global and static compression-level parameter. This parameter says little about the actual impact of compression on the resulting performance. Furthermore, the parameter does not take into account important dynamic factors at the server. As a result, web operators often have to blindly choose a compression level and hope for the best.

In this paper we present a novel elastic compression framework that automatically sets the compression level to reach a desired working point considering the instantaneous load on the web server and the content properties. We deploy a fully-working implementation of dynamic compression in a web server, and demonstrate its benefits with experiments showing improved performance and service capacity in a variety of scenarios. Additional insights on web compression are provided by a study of the top 500 websites with respect to their compression properties and current practices.

Available Media

The Truth About MapReduce Performance on SSDs

2:15 pm-2:30 pm
Refereed Paper

Karthik Kambatla, Cloudera Inc. and Purdue University; Yanpei Chen, Cloudera Inc.

Yanpei Chen is a member of the Performance Engineering Team at Cloudera, where he works on internal and competitive performance measurement and optimization. His work touches upon multiple interconnected computation frameworks, including Cloudera Search, Cloudera Impala, Apache Hadoop, Apache HBase, and Apache Hive. He is the lead author of the Statistical Workload Injector for MapReduce (SWIM), an open source tool that allows someone to synthesize and replay MapReduce production workloads. SWIM has become a standard MapReduce performance measurement tool used to certify many Cloudera partners. He received his doctorate at the UC Berkeley AMP Lab, where he worked on performance-driven, large-scale system design and evaluation.

Solid-state drives (SSDs) are increasingly being considered as a viable alternative to rotational hard-disk drives (HDDs). In this paper, we investigate if SSDs improve the performance of MapReduce workloads and evaluate the economics of using PCIe SSDs either in place of or in addition to HDDs. Our contributions are (1) a method of benchmarking MapReduce performance on SSDs and HDDs under constant-bandwidth constraints, (2) identifying cost-per-performance as a more pertinent metric than cost-per-capacity when evaluating SSDs versus HDDs for performance, and (3) quantifying that SSDs can achieve up to 70% higher performance for 2.5x higher cost-per-performance.

Available Media

ParaSwift: File I/O Trace Modeling for the Future

2:30 pm-2:45 pm
Refereed Paper

Rukma Talwadker and Kaladhar Voruganti, NetApp Inc.

Historically, traces have been used by system designers for designing and testing their systems. However, traces are becoming very large and difficult to store and manage. Thus, the area of creating models based on traces is gaining traction. Prior art in trace modeling has primarily dealt with modeling block traces, and file/NAS traces collected from virtualized clients which are essentially block I/O’s to the storage server. No prior art exists in modeling file traces. Modeling file traces is difficult because of the presence of meta-data operations and the statefulness NFS operation semantics.

In this paper we present an algorithm and a unified framework that models and replays NFS as well SAN workloads. Typically, trace modeling is a resource intensive process where multiple passes are made over the entire trace. In this paper, in addition to being able to model the intricacies of the NFS protocol, we provide an algorithm that is efficient with respect to its resource consumption needs by using a Bloom Filter based sampling technique. We have verified our trace modeling algorithm on real customer traces and show that our modeling error is quite low.

Available Media

T-Talks 2c2

LISA14: Metrics

Super Sizing Your Servers and the Payback Trap

2:45 pm-3:30 pm
Invited Talk

Dr. Neil J. Gunther, Performance Dynamics

Neil Gunther, M.Sc., Ph.D. is a researcher specializing in performance and capacity management. Prior to starting his own consulting company in 1994 (www.perfdynamics.com), Neil worked on the NASA Voyager and Galileo missions, the Xerox PARC Dragon multiprocessor, and the Pyramid/Siemens RM1000 parallel cluster. Neil has authored many technical articles and several books including: Guerrilla Capacity Planning (Springer 2007) and the 2nd edition of Analyzing Computer System Performance with Perl::PDQ (Springer 2011) and received the A.A. Michelson Award in 2008.

As part of IT management, system administrators and ops managers need to size servers and clusters to meet application performance targets; whether it be for a private infrastructure or a public cloud. In this talk I will first establish an analytic framework that can quantify linear, sublinear and negative scalability. This framework can easily be incorporated into Google Docs or R. Several examples including PostgreSQL, Memcached, Varnish and Amazon EC2 scalability will then be presented in detail. The lesser known phenomenon of superlinearity will be examined using this same framework. Superlinear scaling means achieving more performance than the available capacity would be expected to support.

As part of IT management, system administrators and ops managers need to size servers and clusters to meet application performance targets; whether it be for a private infrastructure or a public cloud. In this talk I will first establish an analytic framework that can quantify linear, sublinear and negative scalability. This framework can easily be incorporated into Google Docs or R. Several examples including PostgreSQL, Memcached, Varnish and Amazon EC2 scalability will then be presented in detail. The lesser known phenomenon of superlinearity will be examined using this same framework. Superlinear scaling means achieving more performance than the available capacity would be expected to support.

Available Media

  • Read more about Super Sizing Your Servers and the Payback Trap

T-Talks 3b

LISA14: Syseng

You Have Too Much Data

2:00 pm-2:45 pm
Invited Talk

Hal Stern, Merck & Co

Hal Stern is Executive Director of Applied Technology at Merck & Company, where he focuses on building services and applications to define and shape the data-oriented and data-influenced adjacencies to Merck's core markets. Hal was previously a VP at Juniper Networks and spent more than 20 years at Sun Microsystems. He holds a BSE degree from Princeton University and holds four issued and several filed patents in security, identity, user experience, and networking. Hal has been a frequent speaker at industry and technical conferences, and has co-authored three books, "Professional WordPress: Design and Development" (2010), "Blueprints for High Availability" (2001) and "Managing NFS and NIS" (1991). He is an active Kiva microlender and digital photographer, and may be found on the left wing with his adult ice hockey team, cheering for the NJ Devils, building guitar electronics, or playing golf very badly.

As system administrators we straddle the world of big data hardware and software stacks and the analytics and algorithm development that generates even more synthetic data. We are faced with a mountain of big data *about* big data: there's too much of it and the problem is getting worse the more we join multi-terabyte data sets. How do we make practical sense of the real signals in the noise, finding causality while maintaining privacy and a healthy respect for computational complexity?

As system administrators we straddle the world of big data hardware and software stacks and the analytics and algorithm development that generates even more synthetic data. We are faced with a mountain of big data *about* big data: there's too much of it and the problem is getting worse the more we join multi-terabyte data sets. How do we make practical sense of the real signals in the noise, finding causality while maintaining privacy and a healthy respect for computational complexity?

Available Media

  • Read more about You Have Too Much Data

Data Storage at Librato

2:45 pm-3:30 pm
Invited Talk

David Josephsen, librato.com

As the developer evangelist for Librato, Dave Josephsen hacks on tools, writes about statistics, systems monitoring, alerting, metrics collection and visualization, and generally does anything he can to help engineers and developers close the feedback loop in their systems. He writes the "iVoyer" column on systems monitoring in ;login: magazine.

Librato runs a multi-tenant time-series data storage, analysis, and alerting platform as a service. In this talk we describe our internal data processing and storage infrastructure, which relies heavily on Apache’s Storm and Cassandra projects to process around 250k writes per second and store roughly 10 billion data samples. We give infrastructure details, recounting our initial design decisions as well as various scaling challenges that have forced us to refactor our storage designs. Finally we relate the system and performance metrics that we’ve found useful in monitoring our storage infrastructure.

Librato runs a multi-tenant time-series data storage, analysis, and alerting platform as a service. In this talk we describe our internal data processing and storage infrastructure, which relies heavily on Apache’s Storm and Cassandra projects to process around 250k writes per second and store roughly 10 billion data samples. We give infrastructure details, recounting our initial design decisions as well as various scaling challenges that have forced us to refactor our storage designs. Finally we relate the system and performance metrics that we’ve found useful in monitoring our storage infrastructure.

Available Media

  • Read more about Data Storage at Librato

T-Mini Tutorials 1c

LISA14: Metrics

Insight Engineering: An Introduction to Modern Monitoring and Alerting

2:00 pm-3:30 pm
Mini Tutorial

Joseph Ruscio, CTO of Librato

Joseph Ruscio is a Co-Founder and the Chief Technology Officer at Librato. He's responsible for the company's technical strategy, product architecture, and hacks on all levels of their vision for the future of monitoring. Joe has 15 years of experience developing distributed systems in startups, academia, and the telecommunications industry and he holds a Masters in Computer Science from Virginia Tech. In his spare time he enjoys snowboarding and obsessing over the details of brewing both coffee and beer. He loves graphs.

  • Read more about Insight Engineering: An Introduction to Modern Monitoring and Alerting

T-Mini Tutorials 2c

LISA14: Security

Puppet for the Enterprise

2:00 pm-3:30 pm
Mini Tutorial

Thomas Uphill, Costco Wholesale

Thomas has been using puppet for several years and he has given several tutorials on puppet. He spoke last year at puppetconf 2013 where he obtained the puppet professional certification. He is currently working on a puppet book. An RHCA, he currently works with puppet on the Linux team at Costco Wholesale.

Available Media
  • Read more about Puppet for the Enterprise

LISA Lab Session 5

LISA Lab Office Hours

2:00 pm-3:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

Network Booting with PXE
Paul Krizak, Qualcomm, Inc.

RedHat: Hands on with oVirt
Greg Sheremeta

  • Read more about LISA Lab Office Hours

T-Vendor Talks 1c

Vendor Talk: Deploy and Scale OpenStack

2:00 pm-3:30 pm
Vendor

Dustin Kirkland, Canonical

OpenStack is the world's leading open source cloud platform, and is already changing the way enterprises approach cloud computing, specifically enabling massive scale private infrastructure as a service. Any OpenStack deployment involves multiple required components, and many more optional add-ons. As such, your first OpenStack installation can be intimidating, with the various moving pieces. And scaling it—well, where do you even begin? You begin by attending this session! In 90 minutes, we will deploy OpenStack multiple times, to real, physical hardware nodes, on stage, with LEDs that even light up. We'll scale up core services, such as compute (Nova) and storage (Swift, Ceph, Cinder). We'll create a network (Neutron), import images (Glance). And we'll launch some instances. Finally, we'll deploy a real workload to our brand new private cloud.

OpenStack is the world's leading open source cloud platform, and is already changing the way enterprises approach cloud computing, specifically enabling massive scale private infrastructure as a service. Any OpenStack deployment involves multiple required components, and many more optional add-ons. As such, your first OpenStack installation can be intimidating, with the various moving pieces. And scaling it—well, where do you even begin? You begin by attending this session! In 90 minutes, we will deploy OpenStack multiple times, to real, physical hardware nodes, on stage, with LEDs that even light up. We'll scale up core services, such as compute (Nova) and storage (Swift, Ceph, Cinder). We'll create a network (Neutron), import images (Glance). And we'll launch some instances. Finally, we'll deploy a real workload to our brand new private cloud. You should leave this session with thorough practical knowledge about OpenStack's core components, as well as effective and efficient mechanisms for deploying and scaling OpenStack in a real environment.

Dustin Kirkland is Canonical's Cloud Product Manager, leading the technical product strategy, road map, and life cycle of the Ubuntu Cloud commercial offerings. Formerly the CTO of Gazzang, a venture funded start-up acquired by Cloudera, Dustin designed and implemented an innovative key management system for the cloud, called zTrustee, and delivered comprehensive security for cloud and big data platforms with eCryptfs and other encryption technologies. Dustin is an active Core Developer of the Ubuntu Linux distribution,maintainer of 20+ open source projects, and the creator of Byobu. Dustin lives in Austin, Texas, with his wife Kim, daughters, and his Australian Shepherds, Aggie and Tiger. Dustin is also an avid home brewer.

  • Read more about Vendor Talk: Deploy and Scale OpenStack

3:30 pm–4:00 pm Break with Refreshments Second Floor Foyer

4:00 pm–5:30 pm Thursday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Cedar AB Willow A Redwood AB

T-Talks 1d

LISA14: Syseng

Managing Large Scale Cloud Infrastructure at Rackspace

4:00 pm-4:45 pm
Invited Talk

Jesse Keating, Rackspace

Jesse Keating is a Linux Systems Engineer IV at Rackspace. He has been a part of the Linux community for over 13 years, as a user, contributor, instructor, author, and evangelist. As a believer in Continuous Integration and Continuous Delivery, Jesse is currently in a DevOps role at Rackspace, working on the Public Cloud.

The Rackspace Public Cloud is a large, complex infrastructure. This talk will take a tour of what this infrastructure looks like, how we manage it, how we deploy to it, and the challenges we've faced—from config management choices to orchestration choices, continuous integration/delivery, hot patching, and beyond.

The Rackspace Public Cloud is a large, complex infrastructure. This talk will take a tour of what this infrastructure looks like, how we manage it, how we deploy to it, and the challenges we've faced—from config management choices to orchestration choices, continuous integration/delivery, hot patching, and beyond.

Available Media

  • Read more about Managing Large Scale Cloud Infrastructure at Rackspace

Building a "Multi-Landlord" Public Cloud

4:45 pm-5:30 pm
Invited Talk

Peter Desnoyers, Northeastern University

Peter Desnoyers is a member of the Mass Open Cloud leadership team and an associate professor at Northeastern University, where his work has focused on operating systems, flash storage, and most recently cloud computing. He holds a PhD from UMass (2008) and a BS and MS from MIT (1988); in the intervening years he worked at companies ranging from Apple to VMware.

Orran Krieger is the founder of the Mass Open Cloud and a research professor at BU, where he is founding directory for the Center for Cloud Innovation. He spent five years at VMware architecting the vCloud project, before which he was a researcher and manager at IBM T.J. Watson, leading the Advanced Operating System Research Department. He holds a PhD and MASc in Electrical Engineering from the University of Toronto.

The Massachusetts Open Cloud is a collaboration between five universities, over a dozen industry partners, and the state of Massachusetts to establish a new model of cloud computing. In this model, independent providers offer unbundled services such as compute and storage within a single framework, providing much wider customer choice than existing single-provider clouds while lowering barriers to new and innovative cloud technologies.

What is the Mass. Open Cloud? What are our goals, who are our partners, and what are our plans? This and more will be discussed in our talk, as we describe a vision for the future of cloud computing in which economies of scale coexist peacefully with opportunities for individual innovation.

The Massachusetts Open Cloud is a collaboration between five universities, over a dozen industry partners, and the state of Massachusetts to establish a new model of cloud computing. In this model, independent providers offer unbundled services such as compute and storage within a single framework, providing much wider customer choice than existing single-provider clouds while lowering barriers to new and innovative cloud technologies.

What is the Mass. Open Cloud? What are our goals, who are our partners, and what are our plans? This and more will be discussed in our talk, as we describe a vision for the future of cloud computing in which economies of scale coexist peacefully with opportunities for individual innovation.

Available Media

  • Read more about Building a "Multi-Landlord" Public Cloud

T-Talks 2d

LISA14: Culture

Take Risks... But Don’t Be Stupid

4:00 pm-4:45 pm
Invited Talk

Patrick Eaton, Google, Inc.

Patrick was a member of the Stackdriver engineering team building the intelligent monitoring service for systems built in the public clouds. His work spans architecture, development, and operations. Now an engineer at Google, after the Stackdriver acquisition, Patrick continues to work to build the best monitoring solution for applications in the cloud. Patrick holds a PhD from the University of California at Berkeley for his research in early cloud storage.

The devops culture values taking risks and learning from failure. Of course, there are reasonable risks... and then there is sheer stupidity. A pre-production environment that supports experimentation and exploration can enable responsible risk-taking. As a system grows, however, it is not feasible simply to deploy two of everything. Building an effective pre-production environment becomes an artful exercise in systems-building in its own right. Our production systems runs on several hundred instances; our pre-production environment uses just a few dozen. I will describe the techniques and trade-offs we use to run a powerful and effective testbed with minimal resources. You can use these ideas in your own systems.

The devops culture values taking risks and learning from failure. Of course, there are reasonable risks... and then there is sheer stupidity. A pre-production environment that supports experimentation and exploration can enable responsible risk-taking. As a system grows, however, it is not feasible simply to deploy two of everything. Building an effective pre-production environment becomes an artful exercise in systems-building in its own right. Our production systems runs on several hundred instances; our pre-production environment uses just a few dozen. I will describe the techniques and trade-offs we use to run a powerful and effective testbed with minimal resources. You can use these ideas in your own systems.

Available Media

  • Read more about Take Risks... But Don’t Be Stupid

Building the Women@Work Community

4:45 pm-5:30 pm
Invited Talk

Sangeetha Visweswaran, Microsoft

Sangeetha Visweswaran is a Principal Develepment lead at Microsoft with 13 years of experience in engineering Enterprise management solutions for PCs and mobile devices. Sangeetha graduated with a Masters in Computer Applications in 1999. After graduation Sangeetha joined Microsoft and currently leads the team that designs and engineers solutions to deploy operating systems and manage the server infrastructure in large enterprises. Sangeetha is passionate about growing and developing women in the technical community and currently mentors 5 women one-on-one. She also leads a diverse mentoring ring group and is the president of the women's group in her organization. Sangeetha lives in Redmond, WA with her husband and 2 kids. During her spare time, she loves to bike long distances and explore new routes on her bike.

An outline on how to foster the women community at our workplace irrespective of the size of the establishment.

An outline on how to foster the women community at our workplace irrespective of the size of the establishment.

Available Media
  • Read more about Building the Women@Work Community

T-Talks 3d

LISA14: Dev-Ops

One Year After the healthcare.gov Meltdown: Now What?

4:00 pm-5:30 pm
Invited Talk

Mikey Dickerson, U.S. Citizen

Mikey Dickerson was a Site Reliability Engineer at Google from 2006 to October 2013, when he went on leave to rescue the failing healthcare.gov web site. He was also part of the Obama campaign tech team in Chicago in 2008 and 2012. Prior to Google, he was a systems administrator at Pomona College in Claremont, CA.

From healthcare.gov to Veterans Affairs and a hundred agencies in between, it's not news that the government is bad at technology, and the consequences are getting worse each year. This talk will use the experiences of the ad-hoc band of outsider engineers that are lending their time to look at questions such as: How did we get here? What has worked to turn around failing projects? How do we exploit that experience to change the system?

From healthcare.gov to Veterans Affairs and a hundred agencies in between, it's not news that the government is bad at technology, and the consequences are getting worse each year. This talk will use the experiences of the ad-hoc band of outsider engineers that are lending their time to look at questions such as: How did we get here? What has worked to turn around failing projects? How do we exploit that experience to change the system?

Available Media

  • Read more about One Year After the healthcare.gov Meltdown: Now What?

T-Mini Tutorials 1d

LISA14: Dev-Ops

Configuration Management on Windows Server—Desired State Configuration

4:00 pm-5:30 pm
Mini Tutorial

Steven Murawski, Chef

Steven is a Technical Community Manager for Chef and a Microsoft MVP in PowerShell. Steven is a co-host of the Ops All The Things podcast.

  • Read more about Configuration Management on Windows Server—Desired State Configuration

T-Mini Tutorials 2d

LISA14: Security

DNS Response Rate Limiting

4:00 pm-5:30 pm
Mini Tutorial

Eddy Winstead, ISC

Eddy has over 20 years of DNS, DHCP and sysadmin experience. He was a systems analyst and hostmaster for the North Carolina Research and Education Network (NCREN) for over a decade. At ISC, Eddy has delivered DNS + DNSSEC consulting, configuration audits and technical training.

  • Read more about DNS Response Rate Limiting

LISA Lab Session 6

LISA Lab Office Hours

4:00 pm-5:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

Analyzing Log Analysis
S. Alspaugh, University of California, Berkeley and Splunk Inc.

  • Read more about LISA Lab Office Hours

T-Vendor Talks 1d

Vendor Talk: Hyper-dense Virtualization Delivered with XenServer

4:00 pm-4:45 pm
Vendor

Tim Mackey, XenServer

The economics of cloud and desktop virtualization are pushing hypervisors to support hyper-dense virtual machine densities. In this session we'll cover what XenServer is, the performance and scalability improvements in XenServer 6.5, and how you can take full advantage of XenServer to securely deliver hyper-dense virtualization to your organization. Included in the discussion will be a number of deployment considerations required to maximize the potential of XenServer in a modern data center.

The economics of cloud and desktop virtualization are pushing hypervisors to support hyper-dense virtual machine densities. In this session we'll cover what XenServer is, the performance and scalability improvements in XenServer 6.5, and how you can take full advantage of XenServer to securely deliver hyper-dense virtualization to your organization. Included in the discussion will be a number of deployment considerations required to maximize the potential of XenServer in a modern data center.

  • Read more about Vendor Talk: Hyper-dense Virtualization Delivered with XenServer

CoreOS, An Introduction

4:45 pm-5:30 pm

Brandon Philips, CTO, CoreOS

The architectural patterns of a large scale platform are changing. Dedicated VMs and configuration management tools are being replaced by containerization and new service management technologies like systemd. This presentation will give an overview of CoreOS' key technologies including etcd, fleet, and docker. Come and learn how to use these new technologies to build performant, reliable, large distributed systems.

The architectural patterns of a large scale platform are changing. Dedicated VMs and configuration management tools are being replaced by containerization and new service management technologies like systemd. This presentation will give an overview of CoreOS' key technologies including etcd, fleet, and docker. Come and learn how to use these new technologies to build performant, reliable, large distributed systems.

Available Media
  • Read more about CoreOS, An Introduction

 

Friday, November 14, 2014

8:30 am–9:00 am Continental Breakfast Second Floor Foyer

9:00 am–10:30 am Friday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Redwood AB Willow A

F-Talks 1a

LISA14: Syseng

Software is Eating the Network: SDN and Devops

9:00 am-9:45 am
Invited Talk

John Willis, Stateless Networks

John Willis is the VP of Customer Enablement for Stateless Networks. Willis, a 30-year systems management veteran, joined Stateless Networks from Dell where he was Chief DevOps evangelist. Willis, a noted expert on agile philosophies in systems management, came to Dell as part of their Enstratius acquisition. At Enstratius, Willis was the VP of Customer Enablement responsible for product support and services for the multi-cloud management platform. During his career, he has held positions at Opscode and also founded Gulf Breeze Software, an award-winning IBM business partner specializing in deploying Tivoli technology for the enterprise. Willis has authored six IBM Redbooks for IBM on enterprise systems management and was the founder and chief architect at Chain Bridge Systems. He is also a co-author of the “Devops Cookbook” and the upcoming “Network Operations” published by O'Reilly.

In 2011, Marc Andreessen made his now famous statement, "Software is Eating the World". It's most recent appetite is the networking part of the world. Similar to what has been happening in the past decade for server configuration and administration, networking is being redefined by similar abstractions. One of the biggest disruptors is Software Defined Networking. In this presentation we will explore a few of these new abstractions in terms of what they are, how they work and why you should care. We will also explore the overlap between SDN and Devops. If you are looking for a good overview on SDN, Network Virtualiztion, NFV, and White Box Switches, this is the presentation for you.

In 2011, Marc Andreessen made his now famous statement, "Software is Eating the World". It's most recent appetite is the networking part of the world. Similar to what has been happening in the past decade for server configuration and administration, networking is being redefined by similar abstractions. One of the biggest disruptors is Software Defined Networking. In this presentation we will explore a few of these new abstractions in terms of what they are, how they work and why you should care. We will also explore the overlap between SDN and Devops. If you are looking for a good overview on SDN, Network Virtualiztion, NFV, and White Box Switches, this is the presentation for you.

Available Media

  • Read more about Software is Eating the Network: SDN and Devops

Lightning Talks

9:45 am-10:30 am
Lightning Talks

Lee Damon, University of Washington

Have a last-minute brilliant idea you want to propose? A sudden Epiphany in the hallway track or inspiration from a tutorial or workshop you want to share? The lightning talks are the place. No AV, no slides, just you and your idea and a friendly audience. It doesn't have to be a technical topic. Fill out the below form to signup for a lightning talk.

Have a last-minute brilliant idea you want to propose? A sudden Epiphany in the hallway track or inspiration from a tutorial or workshop you want to share? The lightning talks are the place. No AV, no slides, just you and your idea and a friendly audience. It doesn't have to be a technical topic. Fill out the below form to signup for a lightning talk.

  • Read more about Lightning Talks

F-Talks 2a

LISA14: Culture

Embracing Checklists as a Tool for Improving Human Reliability

9:00 am-9:45 am
Invited Talk

Chris Stankaitis, The Pythian Group

Chris Stankaitis is a Team Lead for the Enterprise Infrastructure Services group at Pythian, an organization providing Managed Services and Project Consulting to companies whose data availability, reliability and integrity is critical to their business.

Chris has spent the past 18 years in the systems administration field with primary goal and focus on designing and building systems with the highest reliability, availability, and scalability. His experience spans many industries from ISP/Telco, to financial to social media.

In his current role, Chris manages a team of Enterprise Infrastructure Consultants, providing leadership and direction to his team, and the many clients he is privileged to work with everyday.

A pilot cannot fly a plane, and a surgeon cannot cut into a person without first going through a checklist. These are some of the most well-educated and highly-skilled people in the world and they have embraced the value of checklists as a tool that can dramatically reduce human error. This talk will focus on the value and the (at time) controversy of checklist adoption in IT.

A pilot cannot fly a plane, and a surgeon cannot cut into a person without first going through a checklist. These are some of the most well-educated and highly-skilled people in the world and they have embraced the value of checklists as a tool that can dramatically reduce human error. This talk will focus on the value and the (at time) controversy of checklist adoption in IT.

Available Media

  • Read more about Embracing Checklists as a Tool for Improving Human Reliability

I Am SysAdmin (And So Can You!)

9:45 am-10:30 am
Invited Talk

Ben Rockwood, Joyent

Ben Rockwood is the Director of Cloud Operations at Joyent, home of SmartOS, Smart Data Center, Node.js, the Manta Storage Service, and the Joyent IaaS Cloud. He has been a system administrator since the early 1990's and has been heavily involved in the Sun Solaris & OpenSolaris communities, as well as in the DevOps & LEAN IT movements of the last several years. He lives in the San Francisco Bay Area with his amazing wife Tamarah and their 5 children.

Systems Administration as a profession is under attack. Industry movements like SRE and DevOps are attempting to redefine Systems Administration from under our feet. Many have even said they aren't SysAdmins anymore. Well, I always have been and always will be a SysAdmin! Let's discuss the changing landscape and bring some sanity to the conversation.

Systems Administration as a profession is under attack. Industry movements like SRE and DevOps are attempting to redefine Systems Administration from under our feet. Many have even said they aren't SysAdmins anymore. Well, I always have been and always will be a SysAdmin! Let's discuss the changing landscape and bring some sanity to the conversation.

Available Media

  • Read more about I Am SysAdmin (And So Can You!)

F-Talks 3a

LISA14: Syseng

Introducing PMR, a Tool Using procfs, systemd, and cgroups to Perform Minimum-Impact daemon and Library Updates

9:00 am-9:45 am
Invited Talk

David Strauss, Pantheon

If you’ve ever deployed an enterprise website, chances are you’ve benefited from one of the tools David's developed. After co-founding Four Kitchens, a successful web development shop, David found himself gravitating away from custom client work and toward infrastructure solutions. Large clients like Creative Commons, Internet Archive, The Economist, and Wikimedia had already benefited from his scalability and database optimization work. Now, his focus is Pantheon, where he's building an infrastructure to support developers and organizations through building, testing, and deploying content management sites.

As we distribute security information and updates ever-faster, administrators must respond quickly to minimize vulnerability and downtime. But, those are often in conflict because it's hard to identify a more careful way to apply an update than rebooting entire servers.

Fortunately, modern Linux kernels show which executables and libraries are loaded into which running process IDs. systemd uses cgroups to track which process IDs belong to which services. Combined, it's possible to identify exactly what services require restarting or reloading after installing updates to files on disk.

A free, open-source tool called PMR (""The Process Maps Restarter"") [https://github.com/pantheon-systems/pmr] automates the demonstrated technique.

As we distribute security information and updates ever-faster, administrators must respond quickly to minimize vulnerability and downtime. But, those are often in conflict because it's hard to identify a more careful way to apply an update than rebooting entire servers.

Fortunately, modern Linux kernels show which executables and libraries are loaded into which running process IDs. systemd uses cgroups to track which process IDs belong to which services. Combined, it's possible to identify exactly what services require restarting or reloading after installing updates to files on disk.

A free, open-source tool called PMR (""The Process Maps Restarter"") [https://github.com/pantheon-systems/pmr] automates the demonstrated technique.

Available Media

  • Read more about Introducing PMR, a Tool Using procfs, systemd, and cgroups to Perform Minimum-Impact daemon and Library Updates

Simplified Remote Management of Linux Servers

9:45 am-10:30 am
Invited Talk

Russell Doty, Red Hat

Russell Doty is a Technology Strategist and Product Manager at Red Hat. He is currently focused on challenges around systems manageability and security, addressing both technical and usability issues.

He has been heavily involved in the technology of computing, with a background in high performance computing, visualization, and computer hardware. Over the last several years he has worked closely with leading hardware developers and the Open Source community. Helping companies with a strong closed source background learn how to embrace and effectively engage with Open Source is an interesting challenge!

How do you manage a hundred or a thousand Linux servers? With practice! Managing Linux systems is typically done by an experienced system administrator using a patchwork of standalone tools and custom scripts running on each system. There is a better way to work – to manage more systems in less time with less work – and without learning an entirely new way of working.

OpenLMI (the Linux Management Infrastructure program) delivers remote management of production servers – ranging from high end enterprise servers with complex network and storage configurations to virtual guests. Designed to support bare metal servers and to directly manipulate storage, network and system hardware, it is equally capable of managing and monitoring virtual machine guests.

In this session we will show how a system administrator can use the new tools to function more effectively, focusing on how they extend and improve existing management workflows and expertise.

How do you manage a hundred or a thousand Linux servers? With practice! Managing Linux systems is typically done by an experienced system administrator using a patchwork of standalone tools and custom scripts running on each system. There is a better way to work – to manage more systems in less time with less work – and without learning an entirely new way of working.

OpenLMI (the Linux Management Infrastructure program) delivers remote management of production servers – ranging from high end enterprise servers with complex network and storage configurations to virtual guests. Designed to support bare metal servers and to directly manipulate storage, network and system hardware, it is equally capable of managing and monitoring virtual machine guests.

In this session we will show how a system administrator can use the new tools to function more effectively, focusing on how they extend and improve existing management workflows and expertise.

Russell Doty is a Technology Strategist and Product Manager at Red Hat. He is currently focused on challenges around systems manageability and security, addressing both technical and usability issues.

He has been heavily involved in the technology of computing, with a background in high performance computing, visualization, and computer hardware. Over the last several years he has worked closely with leading hardware developers and the Open Source community. Helping companies with a strong closed source background learn how to embrace and effectively engage with Open Source is an interesting challenge!

Available Media

  • Read more about Simplified Remote Management of Linux Servers

F-Mini Tutorials 1a

LISA14: Dev-Ops

Testing Your PowerShell Scripts

9:00 am-10:30 am
Mini Tutorial

Steven Murawski, Chef

Steven is a Technical Community Manager for Chef and a Microsoft MVP in PowerShell. Steven is a co-host of the Ops All The Things podcast.

  • Read more about Testing Your PowerShell Scripts

F-Mini Tutorials 2a

LISA14: Syseng

Live Upgrades on Running Systems: 8 Ways to Upgrade a Running Service With Zero Downtime

9:00 am-10:30 am
Mini Tutorial

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

  • Read more about Live Upgrades on Running Systems: 8 Ways to Upgrade a Running Service With Zero Downtime

LISA Lab Session 7

LISA Lab Office Hours

9:00 am-10:30 am

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

  • Read more about LISA Lab Office Hours

10:30 am–11:00 am Break with Refreshments Second Floor Foyer

11:00 am–12:30 pm Friday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Redwood AB Willow A

F-Talks 1b

LISA14: Security

Keep it Simple, Stupid: Why the Usual Password Policies Don't Work, and What to Do About It

11:00 am-11:45 am
Invited Talk

Abe Singer, Laser Interferometer Gravitational Wave Observatory, Caltech, and Warren Anderson, University of Wisconsin, Milwaukee

Abe Singer is the Chief Security Officer for the Laser Interferometer Gravitational Wave Observatory and the LIGO Scientific Collaboration, and formerly the Chief Security Officer of the San Diego Supercomputer Center. At times he has been a programmer, system administrator, security geek, consultant, and expert witness. He is based at the California Institute of Technology in Pasadena.

Warren Anderson is a Visit­ing Assistant Professor in the Department of Physics at the University of Wisconsin–Milwaukee and a member of the LIGO Scientific Collaboration, and is effectively the project manager for the LIGO Identity and Access Management Infrastructure. His publications are primarily on black holes and gravitational waves; he has just begun his foray into the world of computer security.

Common password policies don’t really work; they’re annoying and users still end up with bad passwords. How does one devise a password policy that both manages risk yet remains usable by its users? We present the fundamental problem with common password policies, and how we approached a solution, looking at the effectiveness of password strength rules in combination with human factors. Our result gives us measurable strength and improves usability, without password aging.

The talk will look at the history of password policies, a formal view of password attacks, the usability issues of passwords, and our experiences with our solution.

Common password policies don’t really work; they’re annoying and users still end up with bad passwords. How does one devise a password policy that both manages risk yet remains usable by its users? We present the fundamental problem with common password policies, and how we approached a solution, looking at the effectiveness of password strength rules in combination with human factors. Our result gives us measurable strength and improves usability, without password aging.

The talk will look at the history of password policies, a formal view of password attacks, the usability issues of passwords, and our experiences with our solution.

Available Media

  • Read more about Keep it Simple, Stupid: Why the Usual Password Policies Don't Work, and What to Do About It

Developers and Application Security: Who is Responsible?

11:45 am-12:30 pm
Panel

Mark Miller, Sonatype

Over the past year, I have become more concerned about software vulnerabilities we unknowingly allow into our homes and lives. What are the implications of networking our kitchen appliances, embedding open source components into everything that we touch? Why are we allowing unmoderated access to our personal information just to play simple games on our mobile devices? What does it mean to have unmonitored computer components running your car? Who is managing and validating the components that now make up 90% of most major software applications?

I am building a community of DevOps and AppSec practitioners that acknowledge these issues through the use of multiple platforms (video, podcasts, surveys, advocacy programs) to promote the active monitoring of open source, component based projects.

In early 2014, an alliance of security organizations including Cigital, DevOps Weekly, DevOps Days, HP, Sonatype, DevOps Cafe and the Trusted Software Alliance sponsored a study to determine who is responsible when it comes to security within the development lifecycle. We will present the results of our findings. The presentation will include open discussion with sponsors of the survey, highlighting some of the disturbing findings and how we can begin to build security assurance into the SDLC.

Each attendee will receive a copy of the survey along with analysis notes.

In early 2014, an alliance of security organizations including Cigital, DevOps Weekly, DevOps Days, HP, Sonatype, DevOps Cafe and the Trusted Software Alliance sponsored a study to determine who is responsible when it comes to security within the development lifecycle. We will present the results of our findings. The presentation will include open discussion with sponsors of the survey, highlighting some of the disturbing findings and how we can begin to build security assurance into the SDLC.

Each attendee will receive a copy of the survey along with analysis notes.

Over the past year, I have become more concerned about software vulnerabilities we unknowingly allow into our homes and lives. What are the implications of networking our kitchen appliances, embedding open source components into everything that we touch? Why are we allowing unmoderated access to our personal information just to play simple games on our mobile devices? What does it mean to have unmonitored computer components running your car? Who is managing and validating the components that now make up 90% of most major software applications?

I am building a community of DevOps and AppSec practitioners that acknowledge these issues through the use of multiple platforms (video, podcasts, surveys, advocacy programs) to promote the active monitoring of open source, component based projects.

Available Media

  • Read more about Developers and Application Security: Who is Responsible?

F-Talks 2b

LISA14: Metrics

Gauges, Counters, and Ratios, Oh My!

11:00 am-12:30 pm
Invited Talk

Caskey L. Dickson, Google, Inc.

Caskey Dickson is a Site Reliability Engineer/Software Engineer at Google, where he works on writing and maintaining monitoring services that operate at "Google scale." Before coming to Google he was a senior developer at Symantec, wrote software for various internet startups such as CitySearch and CarsDirect, ran a consulting company, and taught undergraduate and graduate computer science at Loyola Marymount University. He has an undergraduate degree in Computer Science, a Masters in Systems Engineering, and an MBA from Loyola Marymount.

The only thing worse than no metrics are bad and/or misleading ones. Well-designed metrics enable you to quickly know the state of your service and have confidence that your systems are healthy. Poor metrics distract you from finding root causes of outages and extend downtime. Unfortunately it isn't always obvious what counts and how to count it. This talk will cover the essential attributes needed in quality metrics and walk participants through the steps needed to capture them in a useful format while avoiding common pitfalls in metric design.

The only thing worse than no metrics are bad and/or misleading ones. Well-designed metrics enable you to quickly know the state of your service and have confidence that your systems are healthy. Poor metrics distract you from finding root causes of outages and extend downtime. Unfortunately it isn't always obvious what counts and how to count it. This talk will cover the essential attributes needed in quality metrics and walk participants through the steps needed to capture them in a useful format while avoiding common pitfalls in metric design.

Available Media

  • Read more about Gauges, Counters, and Ratios, Oh My!

F-Talks 3b

LISA14: Syseng

Taming Operations in the Apache Hadoop Ecosystem

11:00 am-11:45 am
Invited Talk

Kathleen Ting and Jonathan Hsieh, Cloudera, Inc.

Kathleen Ting (@kate_ting) is currently a technical account manager at Cloudera where she helps strategic customers deploy and use the Apache Hadoop ecosystem in production. She's a frequent conference speaker, has contributed to several projects in the open source community, and is a committer and PMC member on Apache Sqoop. Kathleen is also a co-author of O’Reilly’s Apache Sqoop Cookbook.

Jonathan Hsieh is a Software Engineer and HBase Team Tech Lead at Cloudera. He is an Apache HBase committer and PMC member and a committer and founder of Apache Flume. He has spoken at many conferences including Hadoop World, Hadoop Summit, HBaseCon and the USENIX NSDI Conference. Jonathan has an M.S. in Computer Science from University of Washington, an M.S. and a B.S. in Electrical and Computer Engineering from Carnegie Mellon University.

The Apache Hadoop stack includes many distributed storage and processing systems, running on clusters ranging from tens to thousands of nodes. At Cloudera, we’ve been supporting tens of thousands of nodes in hundreds of our customers’ production clusters with diverse use cases. For five years, we have been navigating paths for sys admins to manage, tune, and debug the systems. We'll describe a methodology for debugging and tuning between the different layers (app, hadoop, jvm, kernel, networking). We’ll also talk about new tools and subsystems included in our operational best practices to keep your clusters always up, running, and secure.

The Apache Hadoop stack includes many distributed storage and processing systems, running on clusters ranging from tens to thousands of nodes. At Cloudera, we’ve been supporting tens of thousands of nodes in hundreds of our customers’ production clusters with diverse use cases. For five years, we have been navigating paths for sys admins to manage, tune, and debug the systems. We'll describe a methodology for debugging and tuning between the different layers (app, hadoop, jvm, kernel, networking). We’ll also talk about new tools and subsystems included in our operational best practices to keep your clusters always up, running, and secure.

Available Media

  • Read more about Taming Operations in the Apache Hadoop Ecosystem

Getting a Hold of the Hype: Making Containers Useful with Project Atomic

11:45 am-12:30 pm
Invited Talk

Brian Proffitt, Red Hat

Brian Proffitt is a Community Liaison for the oVirt Project at Red Hat and helped launch Project Atomic in 2014. The author of 22 books on Linux, iOS, and even a brief work on Plato, Brian is an adjunct instructor at the University of Notre Dame, living in his native Indiana with his wife and three daughters.

Virtualization was the next Big Thing. Then cloud. Now, containers are at the peak of hype, led by the excitement surrounding Docker. But is this hype justified, or can innovation be tempered and improved by better management and control? This is the problem Project Atomic hopes to solve.

Virtualization was the next Big Thing. Then cloud. Now, containers are at the peak of hype, led by the excitement surrounding Docker. But is this hype justified, or can innovation be tempered and improved by better management and control? This is the problem Project Atomic hopes to solve.

Available Media

  • Read more about Getting a Hold of the Hype: Making Containers Useful with Project Atomic

F-Mini Tutorials 1b

LISA14: Metrics

High-Speed Network Traffic Monitoring Using ntopng

11:00 am-12:30 pm
Mini Tutorial

Luca Deri, ntop / IIT-CNR

Luca Deri is the leader of the ntop project aimed at developing an open-source monitoring platform. He previously worked for University College of London and IBM Research, prior receiving his PhD at the University of Berne. When not working at ntop, he shares his time between the .it Internet Domain Registry (nic.it) and the University of Pisa where he has been appointed as lecturer at the CS department.

Available Media

  • Read more about High-Speed Network Traffic Monitoring Using ntopng

F-Mini Tutorials 2b

LISA14: Culture

Establishing IT Project Management Culture: Nerdherding On the Frontier

11:00 am-12:30 pm
Mini Tutorial

Adele Shakal, Cisco

After obtaining her BS in Geochemistry from California Institute of Technology, Adele Shakal’s two-decade path in the IT industry has included webmastering, UNIX systems administration, technical project management, and IT emergency operations planning. She currently leads project and knowledge management at Metacloud, now a part of Cisco, which offers private clouds based on OpenStack. She has presented at local LOPSA chapter meetings and technical conferences including USENIX’s LISA, O’Reilly’s Velocity, and CascadiaIT.

  • Read more about Establishing IT Project Management Culture: Nerdherding On the Frontier

LISA Lab Session 8

LISA Lab Office Hours

11:00 am-12:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

Human Reliability Checklists
Chris Stankaitis, The Pythian Group

Openstack and Ansible
Jesse Keating, Rackspace

  • Read more about LISA Lab Office Hours

12:30 pm–2:00 pm Conference Lunch Metropolitan Ballroom

2:00 pm–3:30 pm Friday
Grand Ballroom A Grand Ballroom B Grand Ballroom C Grand Ballroom D Redwood AB Willow A

F-Talks 1c

LISA14: Metrics

Testing Storage Systems: Methodology and Common Pitfalls

2:00 pm-2:45 pm
Invited Talk

John Constable, Welcome Trust Sanger Institute

John Constable is a sysadmin in the High Performance Computing team at the Welcome Trust Sanger Institute with 23PB of storage. He has worked on storage evaluation and management (both equipment and system administration) and has evaluated seven storage solutions over the past few years. He lives in Cambridge, UK.

Vendors always have the latest system they would like you to pay for. Let's face it: even if the system is free, there is still a cost to implement, manage, and decomission it. This talk will cover the negative, functional, and performance testing that the Welcome Trust Sanger Institute does to try to cover these areas within a reasonable time scale. The talk will also highlight some of the issues we run into while testing, along with some tips and tricks we've developed after testing several types of vendor equipment over the past three years.

Vendors always have the latest system they would like you to pay for. Let's face it: even if the system is free, there is still a cost to implement, manage, and decomission it. This talk will cover the negative, functional, and performance testing that the Welcome Trust Sanger Institute does to try to cover these areas within a reasonable time scale. The talk will also highlight some of the issues we run into while testing, along with some tips and tricks we've developed after testing several types of vendor equipment over the past three years.

Available Media

  • Read more about Testing Storage Systems: Methodology and Common Pitfalls

Why Test Driven Development Works for SysAdmins

2:45 pm-3:30 pm
Invited Talk

Garrett Honeycutt, LearnPuppet.com

Garrett Honeycutt has been hacking *nix based systems and spreading the merits of open source software for over 15 years. He began using Puppet in 2007 while building out a national carrier grade VoIP system. Previously he has worked on such things as building core internet infrastructure for an ISP, creating mobile media distribution platforms, and as a Professional Services Engineer with Puppet Labs helping customers around the world with Puppet, DevOps processes, and project management and as the Puppet Architect at Ericsson in Stockholm where he coordinated with and mentored those writing Puppet code for their global R&D sites supporting over 30k developers.

Demonstrate the value of Test Driven Development to those who do not self identify as developers. We will discuss questions such as why we should test, why test first, and what to test, as well as practices and tools to help.

Demonstrate the value of Test Driven Development to those who do not self identify as developers. We will discuss questions such as why we should test, why test first, and what to test, as well as practices and tools to help.

Available Media

  • Read more about Why Test Driven Development Works for SysAdmins

F-Talks 2c

LISA14: Syseng

IP Traffic Visualizers from Utah State University

2:00 pm-2:45 pm
Invited Talk

Eldon Koyle, Utah State University

Eldon Koyle has been at Utah State University for the past 7 years. During that time, Eldon has shifted from Linux systems administrator to network administrator. His biggest interests include open-source software and open standards.

A brief overview of two IP visualization tools created at Utah State University. The first tool displays each address in a network block in a grid (up to a /16). The second tool shows traffic as blobs flowing between IP addresses.

A brief overview of two IP visualization tools created at Utah State University. The first tool displays each address in a network block in a grid (up to a /16). The second tool shows traffic as blobs flowing between IP addresses.

Available Media

  • Read more about IP Traffic Visualizers from Utah State University

The Top 5 Things I Learned While Building Anomaly Detection Algorithms for IT Ops

2:45 pm-3:30 pm
Invited Talk

Toufic Boubez, Metafor Software

Toufic has been passionate about machine learning for over 20 years. Prior to Metafor, he was the founder and CTO of Layer 7 Technologies, a leader in API security and management and recently acquired by CA. Prior to Layer 7, Toufic was the founding CTO of Saffron Technology, a big data analytics company specializing in associative memory technology. Toufic is also a well-known SOA and Web Services pioneer and was Chief Architect for Web Services at IBM’s Software Group. He was the co-editor of the W3C WS-Policy specification, and the co-author of the OASIS WS-Trust, WS-SecureConversation, and WS-Federation submissions. He is the author of many publications, articles and several books and is one of the co-authors of the SOA Manifesto. Toufic holds a Master of Electrical Engineering degree from McGill University and a Ph.D. in Biomedical Engineering from Rutgers University.

Most IT Ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this haystack of data and extracting signal from the noise is not easy and generates too many false positives.

In this talk I will show some of the types of anomalies commonly found in dynamic data center environments and discuss the top 5 things I learned while building algorithms to find them. You will see how various Gaussian based techniques work (and why they don’t!), and we will go into some non-parametric methods that you can use to great advantage.

Most IT Ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this haystack of data and extracting signal from the noise is not easy and generates too many false positives.

In this talk I will show some of the types of anomalies commonly found in dynamic data center environments and discuss the top 5 things I learned while building algorithms to find them. You will see how various Gaussian based techniques work (and why they don’t!), and we will go into some non-parametric methods that you can use to great advantage.

Available Media

  • Read more about The Top 5 Things I Learned While Building Anomaly Detection Algorithms for IT Ops

F-Talks 3c

LISA14: Metrics

Linux Performance Analysis: New Tools and Old Secrets

2:00 pm-2:45 pm
Invited Talk

Brendan Gregg, Netflix

Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, analysis, and tuning. He is the author of the book "Systems Performance", and recipient of the USENIX 2013 LISA Award for Outstanding Achievement in System Administration. Previously a performance and kernel engineer, his recent work includes developing visualizations and methodologies for performance analysis, and tools which are included in multiple operating systems.

At Netflix, performance is crucial and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.

At Netflix, performance is crucial and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.

Available Media

  • Read more about Linux Performance Analysis: New Tools and Old Secrets

Is Your Team Instrument Rated? (Or Deploying 89,000 Times per Day)

2:45 pm-3:30 pm
Invited Talk

J. Paul Reed, Release Engineering Approaches

J. Paul Reed has over a decade of experience in the trenches as a build/release and tools engineer, working with such organizations as VMware, Mozilla, and Symantec. In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations "Simply Ship. Every time." He's been able to work across a number of industries, from financial services to cloud-based infrastructure, with teams from 2 to 200. Paul is also a founding host of The Ship Show, a twice-monthly podcast tackling topics related to build engineering, DevOps, and release management.

As DevOps matures from craft, through trade, to a science, we are starting to work on distilling out how we can make DevOps' implementation and the organizational transformation repeatable and predictable, across all kinds of environments. As part of that search, it is time to start looking at humanity's other ""operational"" endeavors and see what is applicable to DevOps.

This talk examines one of the largest operational systems built to date: the national airspace system. We will look at specific aspects of how controllers (operations teams) work with pilots (developers) to safely move millions of passengers (customers) every year, with an incident rate that would make any development shop jealous.

As DevOps matures from craft, through trade, to a science, we are starting to work on distilling out how we can make DevOps' implementation and the organizational transformation repeatable and predictable, across all kinds of environments. As part of that search, it is time to start looking at humanity's other ""operational"" endeavors and see what is applicable to DevOps.

This talk examines one of the largest operational systems built to date: the national airspace system. We will look at specific aspects of how controllers (operations teams) work with pilots (developers) to safely move millions of passengers (customers) every year, with an incident rate that would make any development shop jealous.

Available Media

  • Read more about Is Your Team Instrument Rated? (Or Deploying 89,000 Times per Day)

F-Mini Tutorials 1c

LISA14: Syseng

Reliable Replicated File Systems with GlusterFS

2:00 pm-3:30 pm
Mini Tutorial

John Sellens, SYONEX

John Sellens has been involved in system and network administration for over 25 years, and has been teaching and writing on related topics for many years. He holds an M.Math. in computer science from the University of Waterloo. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the operations team at FreshBooks.

  • Read more about Reliable Replicated File Systems with GlusterFS

F-Mini Tutorials 2c

LISA14: Security

DON'T PANIC: Managing Incident Response

2:00 pm-3:30 pm
Mini Tutorial

Abe Singer, Laser Interferometer Gravitational Wave Observatory, Caltech

Abe Singer is the Chief Security Officer for the Laser Interferometer Gravitational Wave Observatory and the LIGO Scientific Collaboration, and formerly the Chief Security Officer of the San Diego Supercomputer Center. At times he has been a programmer, system administrator, security geek, consultant, and expert witness. He is based at the California Institute of Technology in Pasadena.

  • Read more about DON'T PANIC: Managing Incident Response

LISA Lab Session 9

LISA Lab Office Hours

2:00 pm-3:30 pm

The LISA Lab will offer continued training from speakers and instructors, as well as give attendees the chance to investigate and test new technologies, watch demos, participate in live experiments, and mentor others.

Attending Office Hours or not—the Lab is open for all, so stop by to check it out!

Hadoop Operations and Debugging
Kathleen Ting and Jonathan Hsieh, Cloudera, Inc.

  • Read more about LISA Lab Office Hours

3:30 pm–4:00 pm Break with Refreshments Second Floor Foyer

4:00 pm–5:30 pm Friday

F-Keynote

LISA14: Culture

Transforming to a Culture of Continuous Improvement

4:00 pm-5:30 pm
Closing Session

Courtney Kissler, Vice President of E-Commerce and Store Technologies, Nordstrom

We went from agile transformation to focusing on a leadership development program (which then became an exercise in defining our culture) to a technology strategy story which includes creating a culture of continuous improvement.

Courtney began her career at Nordstrom in 2002 as a security engineer and by 2004 had moved into a leadership role supporting the Direct operations team. After occupying various leadership roles in infrastructure/operations, she moved into roles supporting the Integration Competency Center, Corporate Center delivery teams, and E-commerce program management. In 2012, Courtney assumed her current role supporting program management, delivery and support for the in-store customer mobile app team, innovation lab and continuous improvement.

We went from agile transformation to focusing on a leadership development program (which then became an exercise in defining our culture) to a technology strategy story which includes creating a culture of continuous improvement.

Courtney began her career at Nordstrom in 2002 as a security engineer and by 2004 had moved into a leadership role supporting the Direct operations team. After occupying various leadership roles in infrastructure/operations, she moved into roles supporting the Integration Competency Center, Corporate Center delivery teams, and E-commerce program management. In 2012, Courtney assumed her current role supporting program management, delivery and support for the in-store customer mobile app team, innovation lab and continuous improvement.

Available Media

  • Read more about Transforming to a Culture of Continuous Improvement

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

© USENIX

  • Privacy Policy
  • Contact Us

LISA is a registered trademark of the USENIX Association.