Full Training Program

Jump to overview of: Sun. | Mon. | Tues. | Wed. | Thurs. | Fri.
Sunday, December 9, 2012

Full Day

Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine.  Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

S1
Securing Linux Servers
Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine.  Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

9:00 am5:00 pm

Nautilus 3

This tutorial is also available for live streaming. 
Find out more            Purchase this Session

Linux servers start out very secure: it's what you do with them when you use them that can create insecurities. A straight Linux server install runs minimal services and has few users and a very restrictive firewall, which is a great security posture but is pretty useless for most purposes. As users are added, services enabled, and holes punched through the firewall, the security can deteriorate quickly.

This class will show you how to maintain a strong security posture through careful configuration and proper use of Linux tools and services. Linux contains tools and software that can be enabled to slow brute-force attacks against user accounts, can notice when your accounts have weak passwords or are under attack, can keep services and software up to date, and can sandbox applications to prevent even zero-day attacks. The class will focus on attacks most recently seen, including attacks on mail and Web servers.

Who should attend:

Linux system administrators and security managers familiar with Linux sysem administration, whether you manage a handful or clusters of Linux systems.

Take back to work:

Techniques for securing and maintaining Linux servers.

Topics include:
  • Minimizing risk with appropriate restrictions
  • Managing and tracking application vulnerabilities
  • Sandboxing to prevent attacks
  • Monitoring logfiles
  • Updates and configuration management

 

Additional Materials:

William LeFebvre, CSE

William LeFebvre (S2) has been banging on UNIX systems since 1983, and was first exposed to computer networking with the original ARPANet. He has been teaching at conferences since 1991. William is currently the chief architect for the digital group at Career Sports and Entertainment. He designs and creates production web enviroments, and consults with clients on a wide range of Internet technologies.

Marc Staveley

Marc Staveley (S2) is an independent consultant where he applies his years of experience with UNIX development and administration helping clients with server consolidation and application migration projects. Previously he held positions at SOMA Networks, Sun Microsystems, NCR, and Princeton University. He is a frequent speaker on the standards-based development, multi-threaded programming, system administration, and performance tuning.

S2
Using Amazon Web Services
William LeFebvre, CSE

William LeFebvre (S2) has been banging on UNIX systems since 1983, and was first exposed to computer networking with the original ARPANet. He has been teaching at conferences since 1991. William is currently the chief architect for the digital group at Career Sports and Entertainment. He designs and creates production web enviroments, and consults with clients on a wide range of Internet technologies.

 
Marc Staveley

Marc Staveley (S2) is an independent consultant where he applies his years of experience with UNIX development and administration helping clients with server consolidation and application migration projects. Previously he held positions at SOMA Networks, Sun Microsystems, NCR, and Princeton University. He is a frequent speaker on the standards-based development, multi-threaded programming, system administration, and performance tuning.

9:00 am5:00 pm

Nautilus 2

Amazon offers a solid collection of cloud services through the Amazon Web Services (AWS). These include virtual machines and storage, load balancers, replicated databases, content data delivery, and automatic scaling and monitoring. AWS provides a very rich API to facilitate building applications that utilize these services, but the actual user interfaces can be difficult to master. This tutorial introduces the Amazon Web Services and describes the more popular services and how they can all fit together to support an infrastructure. It provides in-depth instruction on using the user-level interfaces for the more popular services: EC2, EBS, ELB, S3, Cloudfront, and others. Three interfaces will be taught: the Web console, the Firefox plug-in Elasticfox, and the command-line tools. Cautions and pitfalls will be presented along the way to ensure that the student will not make some common mistakes of first-time AWS users.

Who should attend:

System administrators who currently use or are considering the use of Amazon Web Services (AWS), as well as individuals who are tasked with supporting AWS for production services, especially if they are unfamiliar or uncomfortable with the command-line tools and the Web-based interfaces supplied by Amazon. Experience with cloud computing is not required. Experience with the installation and support of basic tools and languages (especially Java and Ruby) would be beneficial. Time will only permit us to study the creation and support of Linux instances in the Amazon cloud.

Take back to work:

Knowledge of the techniques, pitfalls, commands, and programs that will help you make effective use of the Amazon cloud.

Topics include:
  • Introduction to AWS
  • Elastic Compute Cloud (EC2)
  • Elastic Block Store (EBS)
  • Simple Storage Service (S3)
  • Elastic Load Balancing (ELB)
  • Relational Database Service (RDS)
  • More in-depth topics: accessing EC2 instance data from within the instance, boot-time scripts in common AMIs

 

Additional Materials:

Lee Damon, University of Washington

Lee Damon (S3) has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, co-chaired CasITconf '11, and is co-chairing CasITconf '13.

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

SuperSysadmin
S3
Seven Habits of the Highly Effective System Administrator: Hints, Tricks, Techniques, and Tools of the Trade
Lee Damon, University of Washington

Lee Damon (S3) has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, co-chaired CasITconf '11, and is co-chairing CasITconf '13.

 
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am5:00 pm

Nautilus 5

We aim to accelerate the experience curve for junior system administrators by teaching them the tricks (and effective coping strategies) that experienced administrators take for granted and which are necessary for successful growth of both the administrator and the site.

The class covers many of the best practices that senior administrators have long incorporated in their work. We will touch on tools you should use, as well as tools you should try to avoid. We will touch on things that come up frequently, as well as those which happen only once or twice a year. We will look at a basic security approach.

 

Who should attend:

Junior system administrators with anywhere from little to 3+ years of experience in computer system administration. We will focus on enabling the junior system administrator to "do it right the first time." Some topics will use UNIX-specific tools as examples, but the class is applicable to any sysadmin and any OS. Most of the material covered is "the other 90%" of system administration—things every sysadmin needs to do and to know, but which aren't details of specific technical implementation.

Take back to work:

Ideas about how to improve and to streamline your systems and your workload, and, just as important, where to look to find more answers.

Topics include:
  • The five things every site should know
  • Why your computers should all agree on what time it is
  • Why root passwords should not be the same on every computer
  • Why backing up every file system on every computer is not always a good idea
  • Policies—where you want them and where you might want to avoid them
  • Ethical issues
  • Growth and success as a solo-sysadmin as well as in small, medium, and large teams
  • Training, mentoring, and personal growth planning
  • Site planning and roadmaps
  • Budgeting
  • Logistics
  • Books that can help you and your users
Additional Materials:

Half Day Morning

John Arrasjid, VMware

John Arrasjid (S4) is a Principal Architect at VMware, specializing in Enterprise Private Cloud (vCloud), business continuity, disaster recovery, and virtual datacenter architecture design, and is the lead architect for the vCloud Architecture Design Toolkit (vCAT). He has over 20 years of IT and consulting experience. John regularly presents at conferences (VMworld, VMware Partner Exchange, USENIX Annual Technical Conference, and USENIX LISA). John is currently serving as Vice President on the USENIX Board of Directors and has published three books in the SAGE Short Topics Series: Cloud Computing with VMware vCloud Director, Foundation for Cloud Computing with VMware vSphere 4, and Deploying the VMware Infrastructure. John is a founding member of VMware bands Elastic Sky and The Hypervisors. John holds both VCP and VCDX (#001) certifications. John has a Bachelor of Science in Computer Science from SUNY Buffalo, NY.

Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

Cloud Computing
S4
VMware vCloud Concepts, Technology, and Operations
New!
John Arrasjid, VMware

John Arrasjid (S4) is a Principal Architect at VMware, specializing in Enterprise Private Cloud (vCloud), business continuity, disaster recovery, and virtual datacenter architecture design, and is the lead architect for the vCloud Architecture Design Toolkit (vCAT). He has over 20 years of IT and consulting experience. John regularly presents at conferences (VMworld, VMware Partner Exchange, USENIX Annual Technical Conference, and USENIX LISA). John is currently serving as Vice President on the USENIX Board of Directors and has published three books in the SAGE Short Topics Series: Cloud Computing with VMware vCloud Director, Foundation for Cloud Computing with VMware vSphere 4, and Deploying the VMware Infrastructure. John is a founding member of VMware bands Elastic Sky and The Hypervisors. John holds both VCP and VCDX (#001) certifications. John has a Bachelor of Science in Computer Science from SUNY Buffalo, NY.

 
Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

9:00 am12:30 pm

Marina 6

 This class will focus on the latest vCloud technologies.  An overview of the technology, design, implementation, and management will be covered concisely, and demonstrations of various aspects will be given.

This is a vendor-specific class. VMware-specific details will be covered, but concepts may be applicable to other cloud computing technologies. This session will cover the VMware perspective on implementing a cloud solution, the technologies used, third-party integration considerations, and operational concepts.

Who should attend:

Consumers, system administrators, cloud operators, and cloud architects who are interested in learning about the VMware vCloud solution, which is based on NIST guidelines. Experience with VMware vSphere, VMware Chargeback, and Distributed Virtual Switches is preferred but not required.

Take back to work:
  • How VMware vCloud implements cloud functionality based on the NIST guidelines
  • The basics of developing a cloud solution with VMware
  • An understanding of the concepts and technology used in a vCloud-based cloud computing infrastructure
  • How to utilize the VMware vCloud Architecture Toolkit (vCAT) for learning, architecting, operating, and consuming a VMware vCloud
Topics include:
  • VMware vCloud core concepts and features
  • The VMware suite of technologies used to stand up cloud computing environments, including public/private/hybrid and on/off premises
  • VMware vCloud mechanisms for multi-tenancy, resource elasticity, segmentation of resources, and provisioning mechanisms through the use of VMware vCloud Director, vShield, vCenter Chargeback, vCloud Connector, and other technologies
  • Sections of the vCAT materials and how they are used by architects, operators, and consumers of a vCloud-based deployment
  • Demonstration of features
Additional Materials:

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Cloud Computing
S5
RPM Packaging for Sysadmins
David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

9:00 am12:30 pm

Nautilus 4

Packaging software is a must for consistent and automated system provisioning and maintenance, yet very few people do so. We will explore why sysadmins should package software, what benefits accrue to software packagers, and how to package your software.

Who should attend:

System adminstrators with a modicum of experience who have an interest in further automating their environments, and advanced sysadmins who have little or no experience with packaging.

Take back to work:

An understanding of the benefits of packaging and the ability to begin packaging immediately.

Topics include:
  • Automation
  • Consistency
  • Packaging's serendipitous benefits
  • How to package software
  • Tools to make packaging easier
Additional Materials:

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

S6
Agile Software Development: Getting It Out the Door Successfully
Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

9:00 am12:30 pm

Nautilus 1

The days of monolithic software development are gone. Agile (a combination of ideas from extreme programming and lean manufacturing) is now the normal practice, even amongst the largest corporations. Agile flips traditional software development on its head, moving everything into small time-boxed iterations of a few weeks. Imagine shipping software every few weeks!

Who should attend:

Programmers who wish to learn a better way to deliver software; programmers who have heard of agile, but don’t understand what the fuss is all about; sysadmins who wish to learn a simple set of techniques to help them maintain their scripting and infrastructure automation.

Take back to work:

An understanding of all of the key concepts for agile software development; an immediate ability to apply these concepts and improve your delivery.

Topics include:
  • The history of agile development
  • Key concepts: iterations, stories, planning, showcases, retrospectives, Kanban, and visible workflow
  • How they all come together into a cohesive, radically simpler way to develop software
Additional Materials:

Half Day Afternoon

Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

Cloud Computing
S7
vCloud Architecture Design with vCAT (vCloud Architecture Toolkit)
New!
Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

 
David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

1:30 pm5:00 pm

Marina 6

This is a vendor-specific class. VMware vCloud is a suite of VMware technologies used to stand up cloud computing environments, including public/private/hybrid and on/off premises. VMware vCloud provides multi-tenancy, resource elasticity, segmentation of resources, and provisioning mechanisms through the use of VMware vCloud Director, vShield, vCenter Chargeback, vCloud Connector, and other technologies. Experience with VMware vSphere, VMware Chargeback, and Distributed Virtual Switches is preferred but not required. It is recommended, but not required, that attendees have previous knowledge of vCloud or have attended "VMware vCloud Concepts, Technology, and Operations."

Who should attend:

Cloud architects and cloud operators who are interested in deploying a VMware vCloud.

Take back to work:

The knowledge needed to deploy a VMware Cloud for use as an enterprise private cloud.

Topics include:
  • VMware vCloud core concepts and features
  • Architecture design considerations
  • vCloud architecture design patterns and best practices
  • Demonstration of features
Additional Materials:
S8
MongoDB: NoSQL Operations Hands On
New!
1:30 pm5:00 pm

Nautilus 4

MongoDB is an open-source, document-oriented, NoSQL database designed with both scalability and agility in mind. The goal of MongoDB is to bridge the gap between key-value stores (which are fast and scalable) and relational databases (which have rich functionality).

Attendees will work through several model operational scenarios, covering both planned and unplanned maintenance tasks, backups and recovery processes, responding to database growth requirements, and more!

Who should attend:

Anyone interested in learning about operating a MongoDB deployment.

Take back to work:

What you need to know to successfully deploy and maintain a MongoDB database, diagnose performance issues, import and export data from MongoDB, and establish the proper backup and restore routines.

Topics include:
  • Setting up replica sets
  • Migrating servers to different hosts
  • Repairing a deployment after hardware failure
  • Scaling out with auto-sharding
Additional Materials:

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

S9
Introduction to Provisioning
New!
Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

1:30 pm5:00 pm

Nautilus 1

As sysadmins, provisioning is one of the most obvious, basic, and important things we do. This simple topic covers areas as diverse as establishing the right standards, automated delivery (where applicable), ensuring software license compliance, and, oh, actually building the machine (physical or virtual).

For such an obvious part of our duties, it is surprising just how much thought goes into effective provisioning. This tutorial will cover a wide gamut of issues, many of which must be performed long before the machine is ever built.

 

Who should attend:

Sysadmins, IT/IS managers, and any other folks responsible for new systems (real or virtual), installations, and integrations; sysadmins looking towards the cloud, infrastructure automation, and more maintainable systems.

 

Take back to work:

A deep understanding of the provisioning process and its context in the wider infrastructure lifecycle; processes and best practices for efficient and timely roll-out and integration of systems.

Topics include:
  • The infrastructure lifecycle 
  • The three parts to provisioning: planning, pre-provisioning, and provisioning 
  • Automation 
  • The MUST haves (Mandatory Ubiquitous Service Transport) 
  • Management traffic: in-band, out-of-band, and image distribution 
  • The build (physical and virtual) 
  • Software licensing and other topics you probably didn’t think of
Additional Materials:
Monday, December 10, 2012

Full Day

Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine.  Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

M1
SELinux (Security-Enhanced Linux)
Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine.  Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

9:00 am5:00 pm

Nautilus 4

SELinux provides an extra layer of security for most Linux systems—if you leave it enabled. Most commonly, SELinux gets disabled as the first step when debugging system problems, even when it is not the problem. SELinux can stop many attacks, even previously unknown (zero-day) attacks, as it confines applications' access to files, directories, commands, and network sockets.

This class will show you how to work with SELinux: how to determine if SELinux is blocking an application and how to adjust policy to move beyond problems. SELinux includes many tools for viewing audit logs, file and process contexts, modifying policy, and even interpreting log messages, and you will learn how to use these tools. You will learn how to modify file contexts, add new policy, monitor logs both graphically and in text-only mode, and, most importantly, how to recover full SELinux coverage on systems where it has been disabled. The class will cover reading and modifying existing policy where necessary, so that changes to services, such as non-standard directory locations, are accommodated. The class will also investigate adding new, custom services to SELinux policy.

This class includes exercises that will be performed using a provided VM.

 

Who should attend:

Linux system administrators and security managers who want or are required to use SELinux. Participants must be familiar with Linux system administration; previous frustration with SELinux is expected but not required.

Take back to work:

The ability to run Linux servers and desktops with SELinux enabled and to modify policy to handle configurations not supported by the default policy.

Topics include:
  • SELinux uncloaked
    • Types, contexts, and roles
    • Context-based policy
    • Extensions to familiar commands
    • Using the sandbox command
  • Using the audit file
    • Tools for deciphering audit messages
    • Searching audit messages
    • Using setroubleshoot
  • Adjusting file/directory context
    • Fixing common access problems
  • Using Booleans to adjust policy
  • Extending policy
    • Using audit2allow to correct policy
    • Using sepolgen to create new policies
    • Restricting users
    • SELinux rule syntax
    • Understanding and using macros

 

Additional Materials:

Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

IPv6 and DNSSEC
M2
Using and Migrating to IPv6
Updated!
Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

9:00 am5:00 pm

Nautilus 3

The Internet is facing an imminent exhaustion of IP addresses. IPv6, the next-generation Internet Protocol, is designed to address this problem, among other things. If you have not yet started to deploy IPv6, now is the time. The "World IPv6 Launch" event took place in June 2012, and major industry players such as Google, Facebook, Comcast, Yahoo!, Akamai, and Cisco already support IPv6, with many others coming on board in the near future. You need to be next!

This tutorial will provide a detailed introduction to IPv6 and will also cover practical aspects of configuring and using IPv6 in networks, operating systems, and applications. Basic knowledge of IPv4 is assumed, but no prior knowledge of IPv6 is necessary. The course will go over the basics of IPv6 and dive into the details you need.

Who should attend:

System administrators, network administrators, and application developers who need to prepare for migration to IPv6, and anyone who wants a general introduction to IPv6 and what is involved in deploying it.

Take back to work:

An understanding of IPv6, with the basic knowledge to begin designing and deploying IPv6 networks, systems, and applications.

Topics include:
  • The current IP address landscape and the motivation for IPv6
  • IPv6 addresses and protocol details
  • DHCPv6
  • DNS
  • Tunneling
  • Configuring hosts and application services
  • IPv6-related security topics

Time permitting, and depending on audience interests, further topics can be covered, such as configuring IPv6 routing or programming examples.

Additional Materials:

Half Day Morning

Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

Cloud Computing
M3
vCloud Advanced Topics 1: Compliance, Security, and Networking with VMware Network and Security Technologies
New!
Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

 
9:00 am12:30 pm

Marina 2

Networking and security details will be provided about VXLAN and vShield technologies for use in a vCloud infrastructure design. Some concepts will also be applicable to a vSphere deployment and will thus be of value to both vSphere- and vCloud-skilled attendees.

We present how to utilize the VMware vCloud Architecture Toolkit (vCAT) for architecture design and operations of networking and security in a vCloud infrastructure.

Attendees will learn design and operational concepts and guidelines for addressing compliance within a vCloud through the use of VMware technologies for networking and security.

Who should attend:

Cloud architects and cloud operators who are interested in the networking and security components tied to design and operation of a vCloud infrastructure. Prior knowledge of vCloud or attendance at Sunday's vCloud classes is strongly recommended but not required.

Take back to work:

Best practices and guidelines from vCAT to support business requirements for compliance and connectivity.

Topics include:
  • vCloud networking, including advanced topics such as vShield and VXLAN
  • vCloud security considerations for design and operations
  • Demonstration of features
Additional Materials:

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

M4
Core Skills: Scripting for Automation
New!
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am12:30 pm

Nautilus 5

Automation is critical to surviving your system administration career with your sanity, hair, and systems intact. If you can automate some or all of a task, then you stand to make considerable gains in personal productivity, task repeatability, and system predictability.

So how can you achieve this state of Nirvana?  The answer is, by scripting.

This class is a practical crash course in how, using a combination of bash, Perl, and friends, you can write useful scripts that solve real-world system administration problems.

Please note that this is a hands-on class.  A basic understanding of programming ("What's a loop?") and how to edit files in your favorite flavor of U*X are assumed.  Attendees will need to bring a laptop with OS X, Linux, or FreeBSD installed to complete in-class tasks.

if there is sufficient interest, we will schedule a BoF for scripting challenges, where we can work them out interactively as a group.

Who should attend:

Junior and intermediate sysadmins who are new to scripting or would like to create scripts to reliably automate sysadmin tasks.

Take back to work:

An understanding of how to apply standard utilities in your scripts, along with recipes for automating typical administration tasks.

Topics include:
  • Controlling programs and processes
  • Script building blocks
  • Searching
  • Working with data
  • Pipes and performance
  • Tying them all together to write effective scripts
  • When scripts might not be the best choice
Additional Materials:

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

SuperSysadmin
M5
Introduction to Time Management for System Administrators
Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

9:00 am12:30 pm

Nautilus 2

This tutorial is also available for live streaming. 
Find out more            Purchase this Session

If you agree with any of these statements, this class is for you:

  • I don't have enough time to get all my work done.
  • As a sysadmin, I can't schedule, prioritize, or plan my work.
  • I'm spending all my time mopping the floor; I don't have time to fix the leaking pipe.
  • My boss says I don't work hard enough, but I'm always working my ____ off!
Who should attend:

Sysadmins and developers who need more time in their day, who have problems getting projects done because of constant interruptions, or who want more control over their time and the ability to schedule work instead of working at the whim of their users.

Take back to work:

The skills you need to get more done in less time.

Topics include:
  • Introduction
    • Why typical "time management" books don't work for sysadmins
    • What makes "to-do" lists fail, and how to make them work
    • How to eliminate "I forgot" from your vocabulary
  • Managing interruptions
    • Preventing them from getting to you
    • Managing the ones you get
    • Sharing the load with co-workers
  • To-do lists
    • Achieving perfect follow-through
    • The Cycle System for recording and processing to-do lists
    • Prioritization techniques
    • Scheduling your work (for a sysadmin? really?)
  • Task grouping: Batching, sharding, and multitasking
  • Handling the most difficult days
    • The day before a vacation
    • The day a big outage disrupts your perfectly planned day
Additional Materials:

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Cloud Computing
M6
Building a Massively Scalable, Geographically Disparate IaaS Cloud
New!
David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

9:00 am12:30 pm

Nautilus 1

In this half-day class we'll give system administrators an understanding of IaaS clouds, with the reference implementation being Apache CloudStack. We'll cover everything from initial configuration to scaling and maintaining availability.

Who should attend:

Intermediate to advanced sysadmins or enterprise architects wanting to deploy a production Infrastructure as a Service cloud. Experience with virtualization platforms and a deep understanding of L2/L3 networking are preferred but not required.

Take back to work:

What you need to deploy an IaaS cloud, based on Apache CloudStack, in an enterprise environment.

Topics include:
  • Deploying Apache CloudStack
  • Next-generation network topologies, including SDN
  • Scaling storage without becoming indentured to SAN vendors
  • Making CloudStack scale to tens of thousands of physical nodes
  • Maintaining availability in a "failure is assured" environment
Additional Materials:

Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

M7
Linux Network Service Administration
New!
Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

9:00 am12:30 pm

Marina 6

System administrators are being tasked with bringing their office environments online, whether that's one stand-alone client attached to the Internet or a distributed network of Web servers. The network services that need to be configured in order to do this can be daunting to administrators who aren't familiar with the required applications. We will cover configuration examples, as well as overviews of the underlying protocols, that attendees can take home and put to work immediately.

The tutorial will be conducted in an open manner that encourages question-and-answer interruptions.

Who should attend:

System administrators who are implementing network services and are looking for a background in the configuration of those services as well as basics of the protocols. Attendees should have some network client/server experience and a basic knowledge of UNIX administration, but they do not need to be experienced network administrators. Both new and intermediate network administrators will leave the tutorial having learned something.

Take back to work:

Confidence in the ability to set up and maintain secure network services.

Topics include:
  • Overview
  • Network services
    • SSH: Secure shell with OpenSSH
    • FTP: Explore vsftpd
    • HTTP: Apache and Tux and Squid
    • SMTP: Postfix MTA
    • NFS: Network File Systems
    • LDAP: Global authentication with OpenLDAP
    • DHCP: DHCPD and PXE
    • DNS: ISC's BIND
    • NTP: Network Time
    • LPD: Printing with cups
  • Host-based security with TCP Wrappers and Xinetd
  • Overview of Linux packet filtering
  • Network monitoring and logging
  • Network utilities you should be using
Additional Materials:

Half Day Afternoon

David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

Cloud Computing
M8
vCloud Advanced Topics 2: Availability, Business Continuity, and Disaster Recovery
New!
David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

 
Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

1:30 pm5:00 pm

Marina 2

Learn about availability techniques for vCloud infrastructure and workloads. This class will cover vCloud availability design aspects for site failure, with VMware Site Recovery Manager and storage replication adapters, and for workload failure, utilizing third-party backup/restore technologies.

Who should attend:

Cloud architects and cloud operators who are interested in availability, business continuity, and disaster recovery for both the vCloud infrastructure and the deployed workloads. Prior knowledge of vCloud or attendance at the three previous VCloud classes at LISA '12 is recommended but not required.

Take back to work:

Best practices and guidelines from the VMware vCloud Architecture Toolkit (vCAT) to support business requirements for compliance and connectivity.

Topics include:
  • Backup and recovery of vCloud infrastructure and workloads
  • Site failover considerations and options for vCloud infrastructure and workloads
  • Use of VMware HA, DRS, FT, and other technologies
  • Demonstrations where appropriate, if time permits
Additional Materials:

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

M9
Advanced Shell Programming
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm5:00 pm

Nautilus 5

The humble shell script is still a mainstay of UNIX/Linux system administration scripting languages, despite the wide availability of alternatives such as Perl, Python, TCL, and other similar languages. This class details techniques that move beyond the quick-and-dirty shell script.

Who should attend:

Intermediate system administrators or anyone with a solid knowledge of programming and with some experience in Bourne/Korn shells or their derivatives.

Take back to work:

An understanding of how to use the "lowly" shell to achieve lofty goals.

Topics include:
  • Common mistakes and unsafe practices
  • Modular shell script programming
  • Building blocks: awk, sed, etc.
  • Writing secure shell scripts
  • Performance tuning
  • Choosing the right utilities for the job
  • Addressing portability at the design stage
  • When not to use shell scripts
Additional Materials:

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

SuperSysadmin
M10
Advanced Time Management: Team Efficiency
Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

1:30 pm5:00 pm

Nautilus 2

This tutorial is also available for live streaming. 
Find out more            Purchase this Session

Attendees must bring a laptop for in-class exercises. Techniques will apply to any wiki or collaborative document system, but the labs and demos will be done using Google Apps.

Who should attend:

All sysadmins who want to collaborate efficiently within their team and with others (even solo sysadmins will benefit!).

Take back to work:

Techniques to help your IT team work better, faster, and more transparently.

Topics include:
  • Meetings and email
    • Making meetings not suck
    • Handling meetings that can't be fixed
    • Stopping incoming email overload
    • Making sure your email gets read
    • How to get your co-workers to go along with your awesome ideas
  • Working better together using collaborative documents
    • Buy vs. build: How to get a team to agree
    • Common sysadmin uses of collaborative documents
    • Uncommon sysadmin uses of collaborative documents
    • Communicating a new design before you build it
    • Tracking loaner resources
    • Doing surveys
    • Assuring consistent results no matter who does the task
    • Making sure everyone on the team can share the work
    • Quick and easy way to document each service (and why you should)
    • Quick and easy way to document procedures (so others can do them for you)
    • Pager-duty tips for creating a feedback loop to assure constant improvement
    • Template for a simple IT department home page
Additional Materials:

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

M11
Monitoring Servers, Networks, and Lunchrooms with Zenoss
David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

1:30 pm5:00 pm

Nautilus 1

To quote Tom Limoncelli, "It's not a service if you aren't monitoring it." A decade ago, monitoring effectively meant either cobbled together, home-grown scripts or massive and inflexible enterprise applications. In the intervening time, monitoring has become a must-have for even the smallest environment, the hodgepodge collection of scripts has grown to become unmaintainable, and the massive enterprise applications, while working well, are slow to respond to the changes happening everywhere. In that interim, monitoring with open source software has effectively become the de facto standard, because, like the browser and operating system, monitoring is now a commodity.

Zenoss is free/libre open source software for monitoring applications, networks, servers, and even whether the restroom is in use.

Who should attend:

Sysadmins and managers who are planning to use or are evaluating Zenoss as a monitoring platform, and those who are nascent in their exploration of systems/network monitoring. Sysadmins experiencing scaling/scope issues with other tools such as RRDtool or Nagios will walk away learning much. Participants are expected to be relatively well versed in operating system and application mechanics.

Take back to work:

 A good grasp of the basics of Zenoss and monitoring theory and the ability to put this information to use immediately, along with a rudimentary understanding of some of the more esoteric features Zenoss offers.

Topics include:
  • Monitoring theory
    • Status monitoring
    • Performance monitoring
    • Predictive monitoring
  • Overview of Zenoss capabilities
  • Overview of Zenoss installation
  • Methods to jumpstart monitoring
    • Auto discovery
    • Templating and inheritance
  • Deep inspection of monitoring capabilities
    • SNMP
    • WMI
    • Network Service Checks
    • Nagios
    • Syslog
    • WBEM
    • Esoteric things
  • Dealing with the information/alerts
    • Reporting
    • Alerts
  • Taking your monitoring to the next level
    • Automated dependency checking
    • Integration with configuration management systems
    • ZenPacks—monitoring everything
    • Event transforms
Additional Materials:

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

SuperSysadmin
M12
The First Hundred Days
New!
Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

1:30 pm5:00 pm

Marina 6

Starting at a new company gives you exactly one chance to do things differently, to establish a new approach, realign the team's efforts, and change the tenor of the discussion. To be effective in your new role as leader, you must establish rapport with the other departments and with your team, and take control of the work pipeline.

You've made the leap. You're about to start at a new company, maybe as a senior sysadmin, maybe as a team leader or manager. Now you're asking yourself, "What do I do first?"

Moving to a different company or being promoted internally gives us a unique opportunity to put aside the history of a site and take a new look with a clean set of eyes. If you're hired as the new team lead or manager, then knowing how you're going to get to know the new site and how you're going to get on top of any site consolidation activities is critical to your longevity in the role.

This class discusses the various aspects of moving from a tactical (bottom-up) view of system administration to a strategic (top-down) view. We cover the initial site survey, the first steps of identifying and cauterizing open wounds, and the process of systemic review and improvement.

Who should attend:

Anyone starting or contemplating a new position, including making an in-house move that enables you to start over; anyone with a new boss who wants to understand and help that boss; anyone about to apply for a senior position who wants to take control of the interview process.

Take back to work:

A set of tools and perspectives that will help you evaluate an environment, from company structure and pain points to IT systems and team skills, and help you engage a team in improvements to the department.

Topics include:
  • Part 1: A topical view
    • Organizational awareness: The boss, the company
    • The team: Assessing the team; first repairs
    • Building the work pipeline; second repairs
    • Systems and processes: Workflow management, change management, event management
    • Round 1: Cauterizing the wound
    • Round 2: Some early wins
    • Round 3: The Big Three
    • Systemic improvement programs
  • Part 2: A temporal view
    • The first day
    • The first week
    • The first month
    • The first hundred days
Additional Materials:
Tuesday, December 11, 2012

Full Day

Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

SuperSysadmin
T1
High-Availability Linux Clustering with ricci and luci
Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

9:00 am5:00 pm

Marina 3

We are evolving from a single service running on a single server, complete with multiple points of hardware and software failure...but to what? With services and platforms that need to be restored before our pagers or bat-phones ring, system administrators of today have a need for high availability. Starting with the current realities of modern data centers, this full-day tutorial will explore practical uses of Linux clusters.

Who should attend:

Linux administrators who are planning on implementing a multi-service fail-over cluster implementation in a production environment. Course attendees should be familiar with the basics of system administration in a Linux environment. At no point will the word "cloud" be used, although novice administrators and gurus alike should leave the tutorial having learned something.

Take back to work:

The knowledge and ability to create and administer highly available services and filesystems on a Linux cluster.

Topics include:
  • Linux HA Cluster technology: Corosync, OpenAIS, rgmanager, Conga
  • Data management with shared disk implementations: SAN, iSCSI, AoE, FCoE
  • Node fencing with STONITH
  • Network power switches and IPMI
  • Clustered logical volume management
  • GFS2 filesystems with Distributed Lock Manager (DLM)
  • Service management with failover domains
  • Virtual machines as a cluster service
  • Cluster administration with luci
  • Working with cluster-unaware services

 

Additional Materials:

Half Day Morning

Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

IPv6 and DNSSEC
T2
DNS and DNSSEC
New!
Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

9:00 am12:30 pm

Nautilus 3

This class will give system administrators an understanding of the DNS protocol, including advanced topics such as DNS security. It will provide practical information about configuring DNS services, using examples from the popular ISC BIND DNS software platform.

Who should attend:

Sysadmins and network engineers who are tasked with providing DNS services, as well as anyone interested in knowing more about how the DNS works.

Take back to work:

An understanding of DNS and DNSSEC, with the basic knowledge necessary to design and deploy DNS services.

Topics include:
  • The DNS protocol and how it works
  • DNS master zone file format
  • Server configurations and recommendations
  • DNSSEC (DNS Security Extensions) and how to deploy it
  • Many examples of DNS query and debugging using the "dig" tool
  • DNS and IPv6
Additional Materials:

Steven Murawski, Edgenet

Steven Murawski (T3) is the Senior Windows System Engineer for Edgenet, a data services company, and a Microsoft MVP in PowerShell. In this role, he supports a dynamic infrastructure that pushes the boundaries of the Windows platform.  Steven blogs at UsePowerShell.com (http://blog.usepowershell.com).  Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community (http://gmitpuc.com) and the Greater Milwaukee Script Club (http://blog.usepowershell.com). He speaks regularly to local user groups and can be found at various conferences.

SuperSysadmin
T3
PowerShell Fundamentals
New!
Steven Murawski, Edgenet

Steven Murawski (T3) is the Senior Windows System Engineer for Edgenet, a data services company, and a Microsoft MVP in PowerShell. In this role, he supports a dynamic infrastructure that pushes the boundaries of the Windows platform.  Steven blogs at UsePowerShell.com (http://blog.usepowershell.com).  Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community (http://gmitpuc.com) and the Greater Milwaukee Script Club (http://blog.usepowershell.com). He speaks regularly to local user groups and can be found at various conferences.

9:00 am12:30 pm

Nautilus 4

This tutorial is also available for live streaming. 
Find out more            Purchase this Session

Do you find yourself repeating a task often?  Do you have to retrieve information or change settings on a number of servers or users at once or regularly? Do you find clicking repetitively just too slow to keep up?

If you answered any of these questions with a "Yes," don't miss this half-day class. We will cover a number of ways to make you more productive, in less time—and it is far easier than you may think possible.

Who should attend:

System administrators and anyone else who wants to be more productive on the Microsoft Windows platform.

Take back to work:

Usable commands and patterns to make attendees more effective in working with the Windows platform, along with familiarity with the discovery patterns in PowerShell, so that they can continue to develop their skills.

Topics include:
  • Introduction to PowerShell
    • Finding the commands you need
    • What's in the help files
    • Discovering hidden gems in command output
  • Working from the shell
    • Navigating the file system, registry, and more
    • Working with objects (everything in PowerShell is an object)
  • Working with servers and workstations
    • Discovering WMI
    • Working with text
    • Dealing with remote machines via WMI and PowerShell Remoting
  • PowerShell Version 2 and the upcoming release of Version 3
Additional Materials:

Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

Cloud Computing
T4
Next-Generation Storage Networking: Beyond Conventional SAN and NAS
Updated!
Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

9:00 am12:30 pm

Marina 2

There has been tremendous innovation in the data storage industry over the past few years.  New storage architectures have come to market to challenge traditional SAN and NAS products with nimble new designs that are much better suited to serving the increasingly virtual nature of applications and server infrastructure. Meanwhile, the allure of cloud computing and the emergence of affordable enterprise-class solid state storage devices have inspired ever more innovative approaches to storage caching, tiering, and deduplication.  This lecture is a survey of the latest trends and advances in the data storage industry.  We trace the I/O path from application to storage media and look at a wide variety of solutions to the ever-changing challenges of data storage.

Who should attend:

System administrators running day-to-day operations, enterprise architects, storage administrators. This tutorial is technical in nature, but it does not address command-line syntax or the operation of specific products or technologies. Rather, the focus is on general architectures different ways to tackle various storage management challenges.

Take back to work:

A better understanding of modern storage architectures, various approaches to scaling in both performance and capacity, and a framework for comparing and contrasting various types of storage solutions.

Topics include:
  • The storage I/O path and the fundamentals of storage virtualization
  • Application acceleration with solid state storage devices (SSDs)
  • Automated tiered storage and information life cycle management (ILM)  
  • Deduplication of primary storage 
  • Object storage models and content-addressable storage 
  • Leveraging the cloud for primary storage 
Additional Materials:

Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

Carolyn Rowland

Carolyn Rowland (T5) began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. Carolyn will continue as co-chair of WiAC in 2013. 

SuperSysadmin
T5
A Sysadmin's Guide to Navigating the Business World
Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

 
Carolyn Rowland

Carolyn Rowland (T5) began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. Carolyn will continue as co-chair of WiAC in 2013. 

9:00 am12:30 pm

Nautilus 5

As a system administrator, you are already a crack technical analyst or engineer, but does your management understand the value and importance of your work? When you ask for staff or funding to support new or existing efforts, does management readily support you? Does management look to you to develop the future of IT services in your organization? Do you feel you have enough time to focus on projects and innovation instead of fire-fighting?

If you answered no to several of the above questions, then this course is for you. The first step toward improving your professional quality of life is to create a positive and collaborative relationship with your management. So why not take responsibility for the relationship? Senior management makes the decisions about budget, staffing, and, often, new services, without understanding the full impact of their decisions. What if you could make yourself part of that process? What if it was easy to ask your management for more resources because they already believed strongly in your positive contribution to the organization?

System administrators often have all the responsibility for IT systems, but none of the control. In this course we bring the two sides together by teaching the system administrator how to communicate effectively so that management will listen and understand. You can use the tactics presented here to increase your value in the organization and improve your marketability.

Make business look good, and you become an important asset to your organization. Your management will appreciate these skills because you will be demonstrating your value in ways they understand and that empower them to make smart IT investment decisions. In turn, your professional credibility increases, putting you in a position to influence decisions impacting your role in the organization.

Who should attend:

IT people and sysadmins interested in taking their career to the next level, improving their relationship with senior management, and increasing their value and marketability.

Take back to work:

Skills to help you develop a productive relationship with your management.

Topics include:
  • How to approach management to ask for resources you need
  • Empowering management to make good IT decisions
  • Demonstrating the value of your work in a way that management will understand
  • Convincing management of the importance of time to innovate (R&D)
  • Reducing time spent fire-fighting (efficiencies and cost savings)
  • Growing organizational loyalty for your team
  • How to build the perception that you are customer-focused and mission-oriented
  • Ways to communicate the benefits of supporting a strong IT presence
    • Increased organizational competitiveness
    • Increased employee productivity
    • Cost avoidance and efficiencies
    • Risk management
    • Knowledge and information management
  • How to develop a collaborative relationship with your management that enables both sides to be successful
Additional Materials:

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Cloud Computing
T6
Networking in the Cloud Age
New!
David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

 
9:00 am12:30 pm

Nautilus 1

Networking has been relatively static for decades. We've seen increases in speed, but many of the traditional topologies are inherently limited. Innovative networks are quite different. If you look at public services, such as AWS, or large private cloud deployments, you see that their networking topology looks contrary to everything that's been standard for years. In this half-day class we'll reexamine what limitations there are and what innovative options exist to remove those limitations.

Who should attend:

Advanced system or network admins with a deep understanding of L2/L3 networking who want to learn about new networking technologies that are enabling scaling networks.

Take back to work:

Knowledge of emerging networking standards and where they are best used.

Topics include:
  • How massive public and private clouds build their networks to ensure scalability
  • How software defined networks work
  • Technologies worth looking at: VXLAN, NVGRE, GRE
Additional Materials:

Jason Faulkner, Rackspace

Jason Faulkner (T7), a network engineer for the email and applications division of Rackspace, is responsible for maintaining Linux firewalls and load balancers for millions of business email users. He is a current member of LOPSA and an active contributor to the keepalived project. Outside of his daily responsibilities, he has also maintained the computer history website oldos.org since 2003.

T7
Real-World Insights on How to Secure and Route Your Linux Network
Updated!
Jason Faulkner, Rackspace

Jason Faulkner (T7), a network engineer for the email and applications division of Rackspace, is responsible for maintaining Linux firewalls and load balancers for millions of business email users. He is a current member of LOPSA and an active contributor to the keepalived project. Outside of his daily responsibilities, he has also maintained the computer history website oldos.org since 2003.

9:00 am12:30 pm

Nautilus 2

We will be talking about how to configure IP addresses and static routes using iptables for security and convenience, the iproute2 utilities for advanced routing and IP configurations, and some "gotchas" that can happen with Linux networks. You'll learn why you should throw away ifconfig on Linux in favor of ip, and how to create and optimize iptables rule sets.

You will leave this class with enough IPv6 knowledge to kick-start your transitions or understand your preexisting IPv6 network. We'll also discuss how the loss of NAT doesn't mean your network can't be secure.

Who should attend:

Novice and intermediate Linux system and network administrators, or anyone who wants to understand the ins and outs of networking on Linux. A basic understanding of the OSI model and Linux utilities is recommended but not required. This class is especially well suited for administrators making the leap from Windows to Linux.

Take back to work:

Ways to expertly secure and route your Linux servers on an IP network.

Topics include:
  • Basics
    • Networking review
    • Basic network configuration: Sysconfig-based configuration, ifconfig/route, iproute2
    • Other useful network troubleshooting tools: ethtool, sysctl -a, ip route get (ip address)
  • Firewalls
    • High-level firewall architecture
    • ip[6]tables: differences between iptables and ip6tables, common use cases, how traffic flows through iptables, targets and matches you never knew existed
  • Advanced topics
    • Optimization
    • Complex rulesets
    • Custom chains
  • Other interesting things
    • Multiple routing tables
    • IPv6 tunneling
    • Balancing among multiple ISPs
    • Network tuning using sysactls
Additional Materials:

Half Day Afternoon

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

T8
Documentation Techniques for Sysadmins
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm5:00 pm

Nautilus 3

Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

Who should attend:

System administrators who need to produce documention for the systems they manage.

Take back to work:

The ability to make immediate, practical use of these documentation techniques.

Topics include:
  • Why system administrators need to document
  • The document life cycle
  • Targeting your audience
  • An adaptable document framework
  • Common mistakes in documenting
  • Tools to assist the documentation process

 

Additional Materials:

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

Guido Trotter, Google, Inc.

Guido Trotter (T9), a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

Cloud Computing
T9
Ganeti: Your Private Virtualization Cloud "the Way Google Does It"
New!
Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

 
Guido Trotter, Google, Inc.

Guido Trotter (T9), a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

1:30 pm5:00 pm

Nautilus 4

This tutorial is also available for live streaming. 
Find out more            Purchase this Session

Ganeti is a cluster virtualization system developed mostly at Google but used by many organizations worldwide. Businesses and groups can leverage the power of Ganeti to easily and effectively manage fleets of physical hosts and use them to schedule virtual machine guests.

Reasons for choosing Ganeti include that it is very lightweight, it is simple to install and manage, and it doesn't demand special storage hardware.

Who should attend:

System engineers interested in using virtualization and cloud technologies efficiently to consolidate systems and decouple physical hardware resources from virtual systems. Ideal participants are proficient with Linux/UNIX system administration and may already be using some virtualization technologies, but want to achieve a higher level of scalability for their systems by employing a cluster management technology such as Ganeti, without the need to invest money in specialized hardware resources such as SANs.

Take back to work:

The knowledge needed to create and maintain your own Ganeti cluster, to provide an IaaS cloud or virtualized services.

Topics include:
  • Setting up and managing a Ganeti cluster
  • Ganeti internals: how to make changes
  • Monitoring your cluster and dealing with failure
  • Ganetia as a back end
  • Typical and atypical use cases

This tutorial is also available for live streaming. 
Find out more          Purchase this Session

Additional Materials:

Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

Cloud Computing
T10
Backups, Archiving, and Life Cycle Management: Riding the Wave of Data Proliferation
Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

1:30 pm5:00 pm

Marina 2

Most IT organizations report exponential data growth over time, and whether your data doubles every year, every two years, or every five years, the simple fact remains that if your data capacities double, then both the capacity and the performance of your backup system must double. All of this doubling stresses traditional approaches to data management. Thus, it is no surprise that backup/recovery is one of the most costly and unforgiving operations in the data center. Meanwhile, most IT organizations also report that the vast majority of their unstructured data is seldom or never accessed. Files accumulate year after year, choking the backup systems and driving up costs.

This course explores two main ways to manage the data deluge: (1) optimize backup systems by eliminating bottlenecks, streamlining operations, and bulking up backup infrastructure; and (2) manage the life cycles of unstructured data so that files that are not in active use can be managed separately from files that are in active use. We start by offering a simple framework for defining business requirements and comparing solutions at a high level. We then delve into the various mechanisms for lifecycle management and for eliminating backup system bottlenecks. Some time is spent exploring storage systems that have built-in mechanisms for data protection and lifecycle management.

Who should attend:

System administrators involved in the design and management of backup systems and policymakers responsible for protecting their organization's data.

Take back to work:

Ideas for immediate, effective, inexpensive improvements to your backup systems and a vision for how you might deploy a lifecycle management system that fits your organization.

Topics include:
  • Formulating strategies for data protection and lifecycle management
  • Identifying and addressing backup system bottlenecks
  • Managing fixed content
  • Hierarchical storage management and data migration
  • In-band versus out-of-band approaches to file lifecycle management
  • Breathing new life into tape storage
  • Deduplication: separating hype from reality
  • Object-based storage models for backup and archiving
  • Self-healing and self-protecting storage systems
  • Leveraging the cloud for backup and archiving
Additional Materials:

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

T11
Implementing WordPress for SysAdmins
New!
David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

1:30 pm5:00 pm

Nautilus 5

Savvy sysadmins know that WordPress is no longer just a blogging engine. In more and more cases it hits a sweet spot for people who need a simple content management system to manage their Web site. It is powerful, relatively easy to use, doesn't require a consultant to set up, ties into a whole bunch of other services, is part of a thriving ecosystem, oh, and is (mostly) free.

If you ever get asked to create a Web site for your company, a non-profit you volunteer for, your religious community, a friend's business, an upcoming conference, or the ilk, you owe it to yourself to add WordPress to your toolbox. This class will teach you, from one sysadmin to another, how to implement WordPress to handle all of these scenarios.

Who should attend:

Anyone who has recently been asked to spin up a new Web site or manage an existing WordPress site.

Take back to work:

A sysadmin's understanding of WordPress and the confidence to implement it the right way.

Topics include:
  • How to choose the best way to host and develop your WordPress site
  • How to choose among the kerjillions of WordPress themes available
  • The five plug-ins you won't want to be without
  • Ways to extend WordPress to make it sing and dance the way you want it to
  • How to keep your WordPress installation as safe as possible
  • Where to find the best tips and tricks to further your WordPress knowledge

We'll go over all the steps a respectable sysadmin would take from first download to final deployment, including all of those testing/staging steps in between that non-sysadmins tend to forget. People with all levels of experience with WordPress are welcome. If you've already done a little work with WP, come extend and share your knowledge. After you take this class, don't be surprised if the next Web site you implement is running WordPress.

Additional Materials:

Theodore Ts'o, Google

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

T12
Recovering from Linux Hard Drive Disasters
Theodore Ts'o, Google

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

1:30 pm5:00 pm

Nautilus 1

Ever had a hard drive fail? Ever kick yourself because you didn't keep backups of critical files, or you discovered that your regularly nightly backup didn't succeed?

Of course not: everybody keeps regular backups and verifies them to make sure they are successful. But for those people who think they might nevertheless someday need this information, this tutorial will discuss ways of recovering from storage disasters caused by failures somewhere in the hardware or software stack.

Who should attend:

Linux system administrators and users.

Take back to work:

How to recover from storage disasters caused by failures somewhere in the hardware or software stack.

Topics include:
  • How data is stored on hard drives
  • Recovering from a corrupted partition table
  • Recovering from failed software RAID systems
  • Low-level techniques to recover data from a corrupted ext2/3/4 filesystem when backups aren't available
  • Using e2image to back up critical ext2/3/4 filesystem metadata
  • Using e2fsck and debugfs to sift through a corrupted filesystem
  • Preventive measures to avoid needing to use heroic measures
Additional Materials:
Wednesday, December 12, 2012

Full Day

Alan Robertson, IBM Linux Technology Center

Alan Robertson (W1) founded the High-Availability Linux (Linux-HA) project in 1998 and led the project for ten years; it has since become the Pacemaker project. He worked for SuSE for a year, then worked in IBM's Linux Technology Center for five years. Alan continues to work on Linux and UNIX systems for IBM on a variety of projects. His most recent open source project is the extremely scalable, discovery-driven Assimilation Monitoring Project.

Before joining SuSE, he was a Distinguished Member of Technical Staff at Bell Labs. He worked for Bell Labs for twenty-one years, in a variety of roles, among which were providing leading-edge computing support, writing software tools, and developing voicemail systems.

W1
Pacemaker and Linux-HA: World-Class High Availability Software
Alan Robertson, IBM Linux Technology Center

Alan Robertson (W1) founded the High-Availability Linux (Linux-HA) project in 1998 and led the project for ten years; it has since become the Pacemaker project. He worked for SuSE for a year, then worked in IBM's Linux Technology Center for five years. Alan continues to work on Linux and UNIX systems for IBM on a variety of projects. His most recent open source project is the extremely scalable, discovery-driven Assimilation Monitoring Project.

Before joining SuSE, he was a Distinguished Member of Technical Staff at Bell Labs. He worked for Bell Labs for twenty-one years, in a variety of roles, among which were providing leading-edge computing support, writing software tools, and developing voicemail systems.

9:00 am5:00 pm

Marina 4

The Linux-HA project (http://linux-ha.org/), together with its child project, Pacemaker, is the oldest and most powerful open source high-availability (HA) package available, comparing favorably to well-known commercial HA packages. This software runs on a variety of POSIX-like systems, including FreeBSD, Solaris, and OS X.

Pacemaker+Linux-HA provides highly available services on clusters from one to more than 16 nodes with no single point of failure. These services and the servers they run on are monitored. If a service should fail to operate correctly, or a server should fail, the affected services will be quickly restarted or migrated to another server, dramatically improving service availability.

Pacemaker supports rules for expressing dependencies between services, and powerful rules for locating services in the cluster. Because these services are derived from init service scripts, they are familiar to system administrators and are easy to configure and manage.

Who should attend:

System administrators and IT architects who architect, evaluate, install, or manage critical computing systems. It is suggested that participants have basic familiarity with system V/LSB-style startup scripts, shell scripting, and XML. Familiarity with high availability concepts is not assumed.

Take back to work:

Both the basic theory of high availability systems and practical knowledge of how to plan, install, and configure highly available systems using Linux-HA and Pacemaker.

Topics include:
  • General HA principles
  • Installation of the Linux-HA and Pacemaker software
  • Configuration overview
  • Overview of commonly used resource agents
  • Managing services supplied with init(8) scripts
  • Sample configurations for Apache, NFS, DHCP, DNS, and Samba
  • Writing and testing resource agents conforming to the Open Cluster Framework (OCF) specification
  • Creating detailed resource dependencies
  • Creating co-location constraints
  • Writing resource location constraints
  • Causing failovers on user-defined conditions
Additional Materials:

Bob Hancock, Sirguey-Hancock, Ltd.

Bob Hancock (W2) is a principal in Sirguey-Hancock, Ltd., a consulting company in New York City. He has spoken throughout the US and Europe on using parallelism and concurrency to build scalable and fast applications in Python. He is the manager of the Google Developer Group—New York and a co-organizer of NYC Python. At Pycon 2012 his talk, "Optimizing Performance with Parallelism and Concurrency," was packed and can be seen at http://www.youtube.com/watch?v=ULdDuwf48kM. You can follow his writings at bobhancock.org and the Open Source project of the implementation of the xmeans algorithm for clustering unstructured data at https://github.com/bobhancock/goxmeans.

W2
The Python Programming Language
Bob Hancock, Sirguey-Hancock, Ltd.

Bob Hancock (W2) is a principal in Sirguey-Hancock, Ltd., a consulting company in New York City. He has spoken throughout the US and Europe on using parallelism and concurrency to build scalable and fast applications in Python. He is the manager of the Google Developer Group—New York and a co-organizer of NYC Python. At Pycon 2012 his talk, "Optimizing Performance with Parallelism and Concurrency," was packed and can be seen at http://www.youtube.com/watch?v=ULdDuwf48kM. You can follow his writings at bobhancock.org and the Open Source project of the implementation of the xmeans algorithm for clustering unstructured data at https://github.com/bobhancock/goxmeans.

9:00 am5:00 pm

Marina 6

We will take a practical tour of Python, with an emphasis on using the language to solve problems in data analysis, performance, systems administration, and network programming. The emphasis will be on solving real-world problems.

Although this will not be a lab, you should have Python 3.3 installed on your laptop if you want to follow along.  Go to python.org to download and install the version appropriate for your operating system.

Who should attend:

Programmers who want to use the Python programming language and understand how it can be applied to practical problems in data analysis, system administration, systems programming, and networking. No prior Python knowledge is required, but attendees should already be experienced programmers in at least one other programming language such as C, C++, Go, Java, Perl, or Ruby.

Take back to work:

An understanding of the Python language and how it can be used to solve your problems on a daily basis.

Topics include:
  • The Python language
    • Basic syntax
    • Core datatypes
    • Control flow and exception handling
    • Functions, generators, and co-routines
    • Modules and classes
    • Testing
    • C and Go extensions
    • The different versions of Python
  • Major library modules
    • Text processing
    • Operating system calls
    • Network programming: core and third-party
    • Web programming: client and simple server
  • Practical programming examples
    • Text parsing
    • Data analysis
    • Processing and analyzing large log files
    • Performance considerations
    • Interacting with the operating system
    • Interacting with Web services
    • Network programming
    • Threads, futures, and co-routines
Additional Materials:

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick (W3) is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

SuperSysadmin
W3
Root Cause Analysis
New!
Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick (W3) is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

9:00 am5:00 pm

Seabreeze

Troubleshooting is hard. I don't claim to be an expert at either doing it or teaching it. On the other hand, I have several decades of experience wielding packet analyzers, debuggers, and log parsers and have accumulated various strategies that I believe you'll find useful. This is a hands-on seminar: you will work through case studies taken from real-world situations. We divide into groups of 3–5, review a simplified version of Advance7's Rapid Problem Resolution (RPR) methodology, and then oscillate, on about a half-hour cycle, between coming together as a class and working in groups. During class time, I describe the scenario, explain the current RPR step, and offer to role-play key actors. During group time, I walk around, coaching and answering questions.

The course material includes log extracts, packet traces, strace output, network diagrams, Cacti snapshots, and vendor tech support responses, all taken from actual RCA efforts. I bring a dozen baseball caps emblazoned with Sys Admin or Storage Admin or End-User and will role-play those personas as needed.

An example: You ask the sysadmin to reboot the server. Meh, OK, the server has rebooted, but after a couple of minutes, the CPU utilization is pegged at 100% again. What do you want to do next?

BYOL (Bring Your Own Laptop) loaded with Wireshark and a graphics viewer (PDF and PNG) for some hands-on, interactive, team-oriented, real-world puzzle solving.

Draft deck visible at:
http://www.skendric.com/problem/rca/Root-Cause-Analysis-LISA-2012.pdf

Who should attend:

System admininstrators and network engineers tasked with troubleshooting multidisciplinary problems.

Take back to work:

Practice in employing a structured approach to analyzing problems that span multiple technology spaces.

Topics include:

Case studies, e.g.:

  • Hourly Data Transfer Fails—Every hour, an application at the clinic wakes up, contacts its partner at a central hospital, and exchanges data, thus keeping the patient databases synchronized. Several times a day, this process fails, alerting the database administrator with the helpful message "A Network Error has occurred."
  • Many Applications Crash—Outlook crashes, Word documents fail to save, Windows Explorer hangs: The office automation applications servicing ~1500 users intermittently report a range of error messages. Suspicion falls on the mass-storage device hosting home and shared directories.
  • Slow Downloads—Intermittently, both internal and external users see slow downloads from the public Web site.  Is it the load-balancer, or the firewall?
Additional Materials:
Thursday, December 13, 2012

Full Day

Theodore Ts'o, Google, Inc.

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

R1
Linux Performance Tuning
Theodore Ts'o, Google, Inc.

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am5:00 pm

Seabreeze

The Linux operating system is commonly used in both the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs, so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprised-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workloads on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:
  • Strategies for performance tuning
    • Characterizing your workload's requirements
    • Finding bottlenecks
    • Tools for measuring system performance
  • Memory usage tuning
  • Filesystem and storage tuning
  • NFS performance tuning
  • Network tuning
    • Latency vs. throughput
    • Capacity planning
  • Profiling
  • Memory cache and TLB tuning
  • Application tuning strategies
Additional Materials:

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

R2
A Day Over the Edge in System Administration
David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

9:00 am5:00 pm

Marina 2

It's time to learn how to break the rules, abuse the tools, and generally turn your system administration knowledge inside out. This class is a cornucopia of ideas for creative ways to take the standard (and sometimes not-so-standard) system administration tools and techniques and use them in ways no one would expect. We'll also cover some tools you may have missed.

Note: The teacher takes no responsibility should your head explode during this class.

Who should attend:

Old-timers who think they've already seen it all and those who want to develop inventive thinking early in their career. Join us and be prepared to be delighted, disgusted, and amazed. Most of all, be ready to enrich your network and system administration by learning to be different.

Take back to work:

New approaches to old problems, along with some ways to solve the insolubles.

Topics include:
  • How to (ab)use perfectly good network transports by using them for purposes never dreamed of by their authors
  • How to improve your network services by intentionally throwing away data
  • How Powerpoint karaoke can make you a better sysadmin
  • And much, much more!
Additional Materials:
Friday, December 14, 2012

Full Day

Tom Wheeler, Cloudera

Tom Wheeler's (F1) career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace, and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design and implement a high-volume data processing system for WebMD, and served as senior programmer/analyst for a brokerage firm. He now works as a Curriculum Developer at Cloudera, a company that helps organizations derive value from their data through products, consulting, training, certification, and support for Apache Hadoop and related tools.

F1
Apache Hadoop for Operations Staff
New!
Tom Wheeler, Cloudera

Tom Wheeler's (F1) career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace, and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design and implement a high-volume data processing system for WebMD, and served as senior programmer/analyst for a brokerage firm. He now works as a Curriculum Developer at Cloudera, a company that helps organizations derive value from their data through products, consulting, training, certification, and support for Apache Hadoop and related tools.

9:00 am5:00 pm

Marina 2

Systems are increasingly automated, and we're generating information faster than ever before. Data storage capacity has risen while prices have fallen, finally making it possible to store and analyze vast amounts of information in order to gain valuable insight into customer behavior, important trends, and new opportunities.

At the forefront of the Big Data revolution is Apache Hadoop, an open source system that harnesses the power of commodity hardware to achieve large-scale data storage and processing. This class will begin with a solid overview of Apache Hadoop and related tools, then move into a more detailed discussion of interest to those who want to learn more about security and performance considerations, as well as the planning, installation, monitoring, and maintenance of a production Hadoop cluster.

Who should attend:

System administrators and network operations staff;  architects and IT management who have a technical background; developers who are interested in a semi-technical introduction to Hadoop.

Take back to work:

What you need to get started with Hadoop.

Topics include:
  • Why the world needs Hadoop
  • What Hadoop is and what it can do
  • Overview of the Hadoop ecosystem
  • The Architecture behind Hadoop
  • Hadoop cluster planning
  • The Hadoop cluster operation
Additional Materials:
F2
Puppet
9:00 am5:00 pm

Nautilus 4

We will take the student through a condensed version of the three-day Puppet Master training, describe the Puppet resource model, introduce the Puppet language, write and test Puppet modules, and deploy the Puppet Master and Dashboard. To get the most out of this class, students should bring a laptop running either VMware or Virtualbox.

Who should attend:

System administrators who are interested in deploying Puppet to subdue the chaos in their infrastructure.

Take back to work:

A thorough understanding of what you need to know and do to deploy Puppet in your organization.

Topics include:
  • Describing system state via Puppet Resource
  • Organizing resources in Puppet modules with classes and defines
  • Writing custom facts and functions
  • Automating configuration of network services
  • Deploying Puppet Master and Dashboard

 

Additional Materials:

Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

F3
Configuration Management Solutions with CFEngine 3
Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

9:00 am5:00 pm

Marina 6

Following a complete rewrite of CFEngine with its popular new syntax and powerful pattern matching capabilities, this full-day class presents an introduction suitable for new users, as well as for users of CFEngine 2. The class is peppered with configuration examples, which can now be self-contained and modularized to an unprecedented degree in the new language.

Who should attend:

Anyone with a basic knowledge of configuration management who is interested in learning the next-generation tool.

Take back to work:

An understanding of the new features of the completely rewritten CFEngine 3, including its new syntax and benefits.

Topics include:
  • Moving from ad hoc scripts to automation
  • The importance of convergence
  • The Promise model
  • Templates and data types
  • Quickstart configuration
  • Creating configuration libraries
  • Upgrading from CFEngine 2
  • Example configurations and demos
  • Achieving compliance with standards and regulations
  • CFEngine on Windows and the Registry
  • Monitoring and self-healing
  • Brief overview of the community and commercial CFEngine roadmap

 

Additional Materials:
Tuesday, February 12, 2013

Half Day Morning

Jeff Darcy, Red Hat

Jeff Darcy (T1) has worked on network and distributed storage problems for twenty years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC and leading the HekaFS project more recently. He is currently a member of the GlusterFS architecture team at Red Hat, coordinating the integration of HekaFS's features and leading the asynchronous-replication development effort.

T1
Building a Cloud Storage System
Jeff Darcy, Red Hat

Jeff Darcy (T1) has worked on network and distributed storage problems for twenty years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC and leading the HekaFS project more recently. He is currently a member of the GlusterFS architecture team at Red Hat, coordinating the integration of HekaFS's features and leading the asynchronous-replication development effort.

9:00 am12:30 pm

Crystal Room

The trend toward moving computation into the cloud has resulted in new expectations for storage in the cloud. This tutorial will provide information necessary to build your own cloud-appropriate storage system.

Who should attend:

Primarily, people who wish to implement their own task-specific cloud storage systems. Secondarily, those who wish to understand why existing cloud storage systems have been designed the way they are, and what tradeoffs they have made to achieve their respective goals.

Topics include:
  • New requirements: Application-level users of cloud storage have come to expect a variety of data and consistency/ordering models well beyond those provided by traditional file, block, or relational-database systems.
  • New constraints: Systems deployed in the cloud are often characterized by low levels of trust (user/user and user/provider) and lack of hardware access of configuration flexibility.
  • Techniques: Implementing a system to meet these new requirements and constraints will require a thorough knowledge of cluster and distributed-system techniques such as vector clocks, Merkle trees, Bloom filters, and various kinds of append-only storage.
  • Case studies: Existing systems representing successful use of these techniques will be examined.
Additional Materials:

James S. Plank, University of Tennessee

Jim Plank (T2) is a professor in the EECS department at the University of Tennessee. He has done research on fault-tolerant computing and storage systems for over 20 years. For the past eight years, his sole focus has been on the design, implementation, and performance of erasure codes in storage systems. He has published numerous papers on the topic, including a very popular tutorial on Reed-Solomon codes and a complete treatment of Minimum Density codes for RAID-6. His open-source libraries for Galois Field arithmetic and for general erasure-coding have been in widespread use by industry and academia.

Cheng Huang, Microsoft Research

Cheng Huang (T2) is a researcher at Microsoft Research, Redmond. He has worked extensively on erasure coding and invented technologies that have been incorporated in a wide variety of Microsoft products, such as in Lync for smooth video conferencing and in Xbox for bounding communication latency between consoles and the cloud. His latest work is LRC, a new class of erasure codes deployed in Windows Azure Storage, which saves the Microsoft Cloud millions of dollars (see http://research.microsoft.com/en-us/news/features/erasurecoding-090512.aspx) and also received the best paper award at USENIX ATC '12.

T2
Erasure Coding for Storage Applications
James S. Plank, University of Tennessee

Jim Plank (T2) is a professor in the EECS department at the University of Tennessee. He has done research on fault-tolerant computing and storage systems for over 20 years. For the past eight years, his sole focus has been on the design, implementation, and performance of erasure codes in storage systems. He has published numerous papers on the topic, including a very popular tutorial on Reed-Solomon codes and a complete treatment of Minimum Density codes for RAID-6. His open-source libraries for Galois Field arithmetic and for general erasure-coding have been in widespread use by industry and academia.

 
Cheng Huang, Microsoft Research

Cheng Huang (T2) is a researcher at Microsoft Research, Redmond. He has worked extensively on erasure coding and invented technologies that have been incorporated in a wide variety of Microsoft products, such as in Lync for smooth video conferencing and in Xbox for bounding communication latency between consoles and the cloud. His latest work is LRC, a new class of erasure codes deployed in Windows Azure Storage, which saves the Microsoft Cloud millions of dollars (see http://research.microsoft.com/en-us/news/features/erasurecoding-090512.aspx) and also received the best paper award at USENIX ATC '12.

9:00 am12:30 pm

Gold Room

From disk arrays through clouds to archival systems, storage systems must tolerate failures and prevent data loss. Erasure coding provides the fundamental technology for storage systems to add redundancy and tolerate failures. This tutorial will cover the fundamentals of erasure coding, the mechanics of many erasure codes that apply to today's storage systems, and the properties of various erasure codes designed for a variety of storage scenarios.

Topics include:
  • General matrix-based codes, starting with classic Reed-Solomon codes
  • Galois Field arithmetic for erasure-coding, and how to implement it efficiently
  • RAID-6 codes: RDP, EVENODD, Minimum Density, X-Code
  • More general codes implemented with only the XOR operation: Generalized RDP/EVENODD
  • Cauchy Reed-Solomon codes
  • Open source library support for erasure codes
  • The reconstruction problem and techniques to reduce bandwidth and I/O
  • Regenerating codes
  • Practical MDS codes with efficient reconstruction: Rotated Reed-Solomon 
  • Practical non-MDS codes with efficient reconstruction and their application in cloud storage: Pyramid codes, LRC and its deployment in Windows Azure Storage, PMDS
  • Erasure coding for Flash
Additional Materials:

Half Day Afternoon

Sudipta Sengupta, Microsoft Research

Dr. Sudipta Sengupta (T3) is currently at Microsoft Research, where he is working on data center systems and networking for cloud computing, non-volatile memory for cloud/server applications, data deduplication, and storage virtualization. Previously, he spent five years at Bell Laboratories, the Research Division of Lucent Technologies. His work on primary data deduplication will ship as a new feature in Windows Server 2012. His work on flash-memory based key-value stores has been incorporated in the data processing and serving pipeline for multiple properties in Microsoft's Bing system. He designed the network topology and routing algorithms for VL2, a low-cost, flexible, and agile next generation data center network, which has been deployed in Microsoft's cloud data centers.

Dr. Sengupta received the IEEE William R. Bennett Prize and the IEEE Leonard G. Abraham Prize for his work on oblivious routing of Internet traffic. At Bell Labs, he received the President's Teamwork Achievement Award for technology transfer of research into Lucent products. At Microsoft, he received the Gold Star Award which recognizes excellence in leadership and contributions for Microsoft's long term success.

Dr. Sengupta has taught advanced courses/tutorials at many academic/research and industry conferences. He has published 75+ research papers in some of the top conferences, journals, and technical magazines. He has authored 45+ patents (granted or pending) in the area of computer systems, storage, and networking. Dr. Sengupta received a Ph.D. and an M.S. from MIT (Cambridge, MA, USA) and a B.Tech. from IIT-Kanpur (India), all in Computer Science. He was awarded the President of India Gold Medal at IIT-Kanpur for graduating at the top of his class across all disciplines.

T3
Data DeDuplication: Technologies, Trends, and Challenges
Sudipta Sengupta, Microsoft Research

Dr. Sudipta Sengupta (T3) is currently at Microsoft Research, where he is working on data center systems and networking for cloud computing, non-volatile memory for cloud/server applications, data deduplication, and storage virtualization. Previously, he spent five years at Bell Laboratories, the Research Division of Lucent Technologies. His work on primary data deduplication will ship as a new feature in Windows Server 2012. His work on flash-memory based key-value stores has been incorporated in the data processing and serving pipeline for multiple properties in Microsoft's Bing system. He designed the network topology and routing algorithms for VL2, a low-cost, flexible, and agile next generation data center network, which has been deployed in Microsoft's cloud data centers.

Dr. Sengupta received the IEEE William R. Bennett Prize and the IEEE Leonard G. Abraham Prize for his work on oblivious routing of Internet traffic. At Bell Labs, he received the President's Teamwork Achievement Award for technology transfer of research into Lucent products. At Microsoft, he received the Gold Star Award which recognizes excellence in leadership and contributions for Microsoft's long term success.

Dr. Sengupta has taught advanced courses/tutorials at many academic/research and industry conferences. He has published 75+ research papers in some of the top conferences, journals, and technical magazines. He has authored 45+ patents (granted or pending) in the area of computer systems, storage, and networking. Dr. Sengupta received a Ph.D. and an M.S. from MIT (Cambridge, MA, USA) and a B.Tech. from IIT-Kanpur (India), all in Computer Science. He was awarded the President of India Gold Medal at IIT-Kanpur for graduating at the top of his class across all disciplines.

1:30 pm5:00 pm

Crystal Room

The tutorial will serve to introduce the state of the art in data deduplication systems for storage. We will make the presentation of most of the material self-contained. We expect attendees to have some background in the basic concepts of storage systems.

The storage market is witnessing unprecedented growth, with enterprise storage growing 50–60% per year and cloud storage growing even faster. Data deduplication is the #1 feature for which customers ask when they invest in storage solutions. Data deduplication detects and eliminates redundancies in data, with the benefits applying to both storage capacity savings ("data at rest") and network bandwidth savings ("data on wire"). In addition to taming the growth in storage total-cost-of-ownership, the storage capacity savings can help to make high IOPS devices like flash-based SSDs more feasible in terms of cost. The network bandwidth savings can help to mitigate WAN bottlenecks, thus enabling user-to-cloud and hybrid private-public cloud storage scenarios.

Backup data deduplication has been around for about a decade, championed by early startups in the space such as Data Domain. Recent developments bring data deduplication to the more expensive and faster primary storage tier, where deduplication space savings is more valuable, translating to reductions in the amount of data that needs to be replicated, geo-replicated, cached, backed up, and transferred over the network.

In this tutorial, we will survey technologies in the data deduplication area at both the algorithmic and systems levels. We will follow the progression of ideas over time and identify current trends in research and industry. We will outline the challenges that need to be addressed going forward. Topics covered will include research aspects of the entire data deduplication pipeline—data chunking, data indexing, primary data access, storage maintenance operations—as well as case studies of commercially deployed systems.

Who should attend:

Graduate students and researchers working in the areas of storage, enterprise computing, cloud computing, and enterprise/Web services; practicing storage professionals in the technology industry, especially in enterprise and cloud data center space.

Additional Materials:

Sandeep Uttamchandani, VMware

Dr. Sandeep Uttamchandani (T4) is the Technical Director for Storage at VMware. Sandeep has worked on a wide variety of enterprise storage products and technologies, as well as has been closely involved in operational management of petabyte scale deployments for large business critical deployments. Sandeep holds 22 issued patents, and has 28 peer-reviewed publications in key storage conferences including FAST, USENIX ATC, and SIGMOD. Prior to VMware, Sandeep was the Chief Architect for Advanced Storage Technologies at IBM GTS, and was responsible for shaping the technical storage strategy for a $10B services business. Previously, Sandeep was Master Inventor at IBM Storage Research Center at Almaden. He holds a Masters and PhD in Computer Science from University of Illinois at Urbana Champaign (UIUC).

T4
Design Trade-offs of CAP Theorem and Beyond: Understanding Implications of Design Choices in a Software-defined Shared Nothing Storage Architecture
Sandeep Uttamchandani, VMware

Dr. Sandeep Uttamchandani (T4) is the Technical Director for Storage at VMware. Sandeep has worked on a wide variety of enterprise storage products and technologies, as well as has been closely involved in operational management of petabyte scale deployments for large business critical deployments. Sandeep holds 22 issued patents, and has 28 peer-reviewed publications in key storage conferences including FAST, USENIX ATC, and SIGMOD. Prior to VMware, Sandeep was the Chief Architect for Advanced Storage Technologies at IBM GTS, and was responsible for shaping the technical storage strategy for a $10B services business. Previously, Sandeep was Master Inventor at IBM Storage Research Center at Almaden. He holds a Masters and PhD in Computer Science from University of Illinois at Urbana Champaign (UIUC).

1:30 pm5:00 pm

Gold Room

The key objective of this tutorial is to provide an understanding of how the design choices made for the key building blocks (such as metadata service, replication, locking, etc.) impacts the overall properties of the shared nothing storage architecture, and also mapping it to the application data model and storage workload requirements.

Eric Brewer coined the CAP theorem to convey that the design of a scale-out system involves trade-offs. CAP is commonly oversimplified to mean that between Consistency, Availability, and Partition tolerance, only two of the three attributes can be realized in a system. In general, the architecture of any shared nothing scale-out storage involves a collection of design choices and trade-offs that ultimately dictate the observable behavior of the system. Following are some choices involved in the design of a shared nothing storage solution:

  • Data locality versus cluster scalability?
  • Master versus masterless metadata architectures?
  • Locking versus multi-version concurrency control?
  • Strong versus eventual versus weak consistency?
  • Replication versus RAID?
  • Node-to-node communication: UDP versus TCP versus RDMA?
  • Two-phase commit versus Paxos versus Multi-Paxos?
  • In-memory data grids versus disk-based DAS architectures?
  • Data models: ACID versus BASE (Basically Available, Soft state, Eventually consistent)?

We will start the tutorial with a bare-bones skeleton of the architecture, then incrementally populate the building blocks. For each building block, we discuss popular design choices, followed by an interactive discussion on the implications of mix-and-match of these building blocks (for example, matching coarse-grained data sharding for better data locality performance, with appropriate patterns for scaling and distributed data recovery). The tutorial assumes a basic knowledge of distributed systems. Additionally, to better appreciate the under-the-hood exploration, we expect an awareness of the cloud storage landscape, and a high-level understanding of the popular solutions.

Who should attend:

Storage architects, engineers, administrators, and students, who are interested in a deep-dive of building blocks and design patterns of software-defined shared nothing storage architectures (a.k.a. cloud storage). 

Additional Materials:
Tuesday, April 2, 2013

Full Day

Vicraj Thomas, BBN Technologies and GENI Project Office (GPO)

Dr. Vicraj (Vic) Thomas is responsible for Distributed Systems and Security for the GENI Project Office (GPO). His expertise lies in the areas of highly-dependable systems, distributed systems protocols and architectures, and wireless sensor networks.  Vicraj is the System Engineer supporting the GENI Experimenter Workflow and Services working group. Before joining BBN Technologies, Vic was with the Honeywell Laboratories where he developed technologies related to wireless networks for flight-essential avionics applications. Dr. Thomas was the Industrial Technology Area Leader for the sensor networks research area within the US-UK International Technology Alliance (ITA) program on Network Science. He also was a systems architect on the NASA C3I network for the Orion program. Vic earned a B. Tech. in Computer Science and Engineering from the Indian Institute of Technology, Mumbai. He earned a M.S. in Computer Science from the University of Rhode Island and a Ph.D. in Computer Science from the University of Arizona.

Niky Riga, BBN Technologies and GENI Project Office (GPO)

Niky is responsible for supporting GENI users in integrating and deploying their experiments within the GENI infrastructure, and ensuring that the deployment makes the best use of GENI resources.

Before joining the GPO in 2010, Niky worked on innovative projects within the Network Research department of BBN. Her focus was designing and prototyping pioneering transport services for Mobile Ad-hoc Networks.

Niky earned a Diploma in Electrical and Computer Engineering at the National Technical University of Athens, and an MS degree in Computer Science at Boston University.

 

Using the GENI Testbed for Network and Distributed System Experiments
New!
Vicraj Thomas, BBN Technologies and GENI Project Office (GPO)

Dr. Vicraj (Vic) Thomas is responsible for Distributed Systems and Security for the GENI Project Office (GPO). His expertise lies in the areas of highly-dependable systems, distributed systems protocols and architectures, and wireless sensor networks.  Vicraj is the System Engineer supporting the GENI Experimenter Workflow and Services working group. Before joining BBN Technologies, Vic was with the Honeywell Laboratories where he developed technologies related to wireless networks for flight-essential avionics applications. Dr. Thomas was the Industrial Technology Area Leader for the sensor networks research area within the US-UK International Technology Alliance (ITA) program on Network Science. He also was a systems architect on the NASA C3I network for the Orion program. Vic earned a B. Tech. in Computer Science and Engineering from the Indian Institute of Technology, Mumbai. He earned a M.S. in Computer Science from the University of Rhode Island and a Ph.D. in Computer Science from the University of Arizona.

 
Niky Riga, BBN Technologies and GENI Project Office (GPO)

Niky is responsible for supporting GENI users in integrating and deploying their experiments within the GENI infrastructure, and ensuring that the deployment makes the best use of GENI resources.

Before joining the GPO in 2010, Niky worked on innovative projects within the Network Research department of BBN. Her focus was designing and prototyping pioneering transport services for Mobile Ad-hoc Networks.

Niky earned a Diploma in Electrical and Computer Engineering at the National Technical University of Athens, and an MS degree in Computer Science at Boston University.

 

9:00 am5:00 pm

The Global Environment for Network Innovations, GENI, is a suite of research infrastructure rapidly taking shape in prototype form across the United States. It is sponsored by the National Science Foundation, with the goal of providing a laboratory environment for networking and distributed systems research and education. It is well suited for exploring networks at scale thereby promoting innovations in network science, security, services and applications.

This tutorial will provide a description of GENI as an experimenter tool and also offer the chance to attendees to gain some hands-on experience in using the testbed. We are going to use the multi-campus Layer 2 network to design and run simple Layer 2 and Layer 3 experiments.

The tutorial will have interleaving presentations and hands-on exercises. Attendees will get a chance to create and run GENI experiments following simple instructions. The attendees will get to work at their own pace and get help by the instructors.

Who should attend:
  • Graduate students and researchers working in areas of computer networking, future Internet architectures, distributed systems, and cloud computing
  • Professors of computer networks and distributed systems that would like to use a large-scale testbed in their class
  • Anyone who is interested in Future Internet Architecture or distributed cloud computing
  • Anyone who wants to get acquainted with GENI, a platform for running network experiments, and gain hands-on experience in using it
Additional Materials:
Sunday, November 3, 2013

Full Day

Rik Farrow, Security Consultant

Rik Farrow began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

Linux
S1
Securing Linux Servers
Rik Farrow, Security Consultant

Rik Farrow began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

9:00 am5:00 pm

Lincoln 2 Room

Linux servers start out very secure; it's what you do with them when you use them that can create insecurities. A straight Linux server install runs minimal services and has few users and a very restrictive firewall, which is a great security posture but is pretty useless for most purposes. As users are added, services enabled, and holes punched through the firewall, the security can deteriorate quickly.

This class will show you how to maintain a strong security posture through careful configuration and proper use of Linux tools and services. Linux contains tools and software that can be enabled to slow brute-force attacks against user accounts, can notice when your accounts have weak passwords or are under attack, can keep services and software up to date, and can sandbox applications to prevent even zero-day attacks. The class will focus on attacks most recently seen, including attacks on mail and Web servers.

Who should attend:

Linux system administrators and security managers familiar with Linux system administration, whether you manage a handful or clusters of Linux systems.

Take back to work:

Techniques for securing and maintaining Linux servers.

Topics include:
  • Minimizing risk with appropriate restrictions
  • Managing and tracking application vulnerabilities
  • Sandboxing to prevent attacks
  • Monitoring logfiles
  • Updates and configuration management
Additional Materials:

John Looney, Google

John Looney graduated from Computer Applications at Dublin City University, and specialized in supporting high-end commercial UNIX environments. He noticed that Irish sysadmin administrators had no formal training opportunities, and set up the Irish Linux User Group to provide a community and monthly free training sessions on the latest open source technologies to over 400 Irish engineers. He ran LinuxWorld Dublin in 2000. While at Hosting365 he built Ireland's largest shared hosting infrastructure for 30,000 customers on a shoestring budget.

Today, he's a Site Reliability Engineer responsible for Google's cluster infrastructure; initially the cluster fabric, GFS and Chubby, and more recently the datacenter automation and remote access technologies. He has built a five-month full-time graduate program to take junior engineers and retrain them to take the pager for Google.com. John is on the Computing Committee for Engineers Ireland.

DevOps
S2
SRE Classroom: Non-Abstract Large System Design for Sysadmins
New!
John Looney, Google

John Looney graduated from Computer Applications at Dublin City University, and specialized in supporting high-end commercial UNIX environments. He noticed that Irish sysadmin administrators had no formal training opportunities, and set up the Irish Linux User Group to provide a community and monthly free training sessions on the latest open source technologies to over 400 Irish engineers. He ran LinuxWorld Dublin in 2000. While at Hosting365 he built Ireland's largest shared hosting infrastructure for 30,000 customers on a shoestring budget.

Today, he's a Site Reliability Engineer responsible for Google's cluster infrastructure; initially the cluster fabric, GFS and Chubby, and more recently the datacenter automation and remote access technologies. He has built a five-month full-time graduate program to take junior engineers and retrain them to take the pager for Google.com. John is on the Computing Committee for Engineers Ireland.

9:00 am5:00 pm

Thurgood Marshall North Ballroom

Truly large-scale systems are still rare; in a world of outsourcing and cloud computing, it's harder for system administrators to get the opportunity to design large systems. It's even harder to get the design right. Most organizations don't have the in-house expertise to build a large system and thus outsource the detailed design to external contractors. If your organization doesn't have the expertise to design a large system, it's unlikely that it has the expertise to confirm a proposal is fit for purpose and cost-effective.

While anyone can wave their hands convincingly and come up with a rough outline of a large distributed system, those who also can fill in the detail are highly prized. This class will teach you how to design software systems like Imgur and Twitter, then estimate the hardware needed to ensure you can deliver to an SLA. You will learn how requirements like queries-per-second, multi-site reliability, and data-security impact the cost of implementation.

This will not cover concurrent programming and the software-engineering side of distributed systems.

Who should attend:

System administrators, SREs, and DevOps who have some familiarity with distributed systems, server hardware, and systems programming, especially those who would like to work with, procure, or build large distributed systems.

Take back to work:

The ability to design large distributed systems, to evaluate design proposals, and to explain such designs to third parties.

Topics include:
  • Design patterns for large distributed systems
  • Monitoring large-scale systems
  • Large-scale design workshop and presentations
  • Non-abstract design; taking a design and producing a "bill of materials"
  • Designing for failure; how to work around rack, networking, and datacenter failures
Additional Materials:

Half Day Morning

Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

Coding
S3
PowerShell Fundamentals
Updated!
Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

9:00 am12:30 pm

Thurgood Marshall South Ballroom

Do you find yourself repeating a task often? Do you have to retrieve information or change settings on a number of servers or users at once or regularly? Do you find clicking repetitively just too slow to keep up?

If you answered any of these questions with a "Yes," don't miss this half-day class. We will cover a number of ways to make you more productive in less time—and it is far easier than you may think possible.

Who should attend:

System administrators and anyone else who wants to be more productive on the Microsoft Windows platform.

Take back to work:

Usable commands and patterns to make attendees more effective in working with the Windows platform, along with familiarity with the discovery patterns in PowerShell, so that they can continue to develop their skills.

Topics include:
  • Introduction to PowerShell
  • Finding the commands you need
  • What's in the help files
  • Discovering hidden gems in command output
  • Working from the shell
  • Navigating the file system, registry, and more
  • Working with objects (everything in PowerShell is an object)
  • Working with servers and workstations
  • Discovering WMI/CIM
  • Working with text
  • Dealing with remote machines via WMI/CIM and PowerShell Remoting
  • PowerShell Versions 2, 3 and the upcoming release of Version 4
Additional Materials:

Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

Linux
S5
DNSSEC: DNS Security Extensions
New!
Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

9:00 am12:30 pm

Wilson C Room

This class will provide system administrators with a detailed understanding of the DNS Security Extensions (DNSSEC). It will provide practical information about configuring DNSSEC using the popular ISC BIND DNS software and will cover both using DNSSEC to cryptographically sign your own DNS zones and configuring DNS resolvers to validate DNSSEC signatures. Many examples of DNS/DNSSEC querying and debugging using the "dig" tool and other diagnostic tools and programs will also be covered. The last part of the course will cover prospects for newer and more exciting uses of the DNSSEC by application protocols that are in the pipeline, such as DANE and TLSA records.

Who should attend:

System administrators and engineers who are tasked with providing DNS services, as well as anyone interested in knowing more about how DNS and DNSSEC works.

Take back to work:

A detailed understanding of DNSSEC with the basic knowledge to configure and deploy it.

Topics include:
  • DNS protocol basics
  • DNSSEC extensions and how they work
  • Configuring, deploying, and troubleshooting DNSSEC
  • Application uses of DNSSEC
Additional Materials:

John Sellens, SYONEX

John Sellens has been involved in system and network administration since 1986 and is the author of several related USENIX papers, a number of ;login: articles, and the USENIX Short Topics book #7, System and Network Administration for Higher Reliability. He holds an M.Math. in computer science from the University of Waterloo and is a Chartered Accountant. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the systems team at Magna International. From 1999 to 2004, he was the General Manager for Certainty Solutions in Toronto. Prior to joining Certainty, John was the Director of Network Engineering at UUNET Canada and was a staff member in computing and information technology at the University of Waterloo for 11 years.

S6
Nagios: Advanced Topics
John Sellens, SYONEX

John Sellens has been involved in system and network administration since 1986 and is the author of several related USENIX papers, a number of ;login: articles, and the USENIX Short Topics book #7, System and Network Administration for Higher Reliability. He holds an M.Math. in computer science from the University of Waterloo and is a Chartered Accountant. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the systems team at Magna International. From 1999 to 2004, he was the General Manager for Certainty Solutions in Toronto. Prior to joining Certainty, John was the Director of Network Engineering at UUNET Canada and was a staff member in computing and information technology at the University of Waterloo for 11 years.

9:00 am12:30 pm

Thurgood Marshall West Ballroom

Nagios is a very widely used tool for monitoring hosts and services on a network. It's very flexible and configurable, and can be extended in many ways using home-grown or already existing extensions. This tutorial will cover the advanced features and abilities of Nagios and related tools, which are especially useful in larger or more complex environments, or for higher degrees of automation or integration with other systems.

Who should attend:

Network and system administrators ready to implement or extend their use of the Nagios system and network monitoring tool.

Take back to work:

The information you need to immediately implement and use the advanced features of Nagios and related tools for monitoring systems and devices on your networks.

Topics include:
  • Theory of operation
  • Configuration for more complex environments
  • Plug-ins: Their creation, use, and abuse
  • Extensions: NRPE, NSCA, NDOUtils
  • Add-ons: Graphing, integration with other tools
  • Abuse: Unexpected uses and abuses of Nagios
Additional Materials:

Half Day Afternoon

Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

Coding
S7
Building Your PowerShell Toolkit
New!
Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

1:30 pm5:00 pm

Thurgood Marshall South Ballroom

Have you found yourself writing little PowerShell scripts to help you day to day? Do you have to share these with co-workers? Do you remember what that script you wrote six months ago does or the parameters that it needs? Join me in learning how to take your PowerShell scripting to the next level.

Who should attend:

System administrators or anyone with basic PowerShell experience who needs to take it to the next level and create reusable, production-ready commands.

Take back to work:

A pattern for taking your basic scripts and functions and turning them into professional, pipeline-ready, production-oriented commands, and the ability to assemble modules, which are the method for organizing and distributing PowerShell commands.

Topics include:
  • Using basic commands
  • Developing scripts and functions
  • Packaging scripts and functions as a solution that is reusable, redistributable, and usable with the maximum variety of input data
  • Error handling
  • Creating help files
  • Working with the pipeline (for receiving input and processing data)
  • A variety of tips and tricks to make your functions robust and practical
  • The ins and outs of packaging your commands as modules, creating module manifests, and
    things to consider with module development
Additional Materials:

Richard Elling, Nexenta Systems

Richard Elling has been designing and building dependable, networked computer systems to solve complex problems for more than 25 years. He was an early adopter of ZFS and has developed benchmarks and analysis techniques for evaluation of data protection schemes and performability of systems. He wrote Designing Enterprise Solutions with Sun Cluster 3.0 (Prentice Hall, 2002) and has authored many white papers, Sun BluePrints, and refereed papers on dependable systems and performability. He is a regular contributor to the ZFS community and is currently the Director of Solution Engineering for Nexenta Systems.

Linux
S8
File Systems: Top to Bottom and Back
New!
Richard Elling, Nexenta Systems

Richard Elling has been designing and building dependable, networked computer systems to solve complex problems for more than 25 years. He was an early adopter of ZFS and has developed benchmarks and analysis techniques for evaluation of data protection schemes and performability of systems. He wrote Designing Enterprise Solutions with Sun Cluster 3.0 (Prentice Hall, 2002) and has authored many white papers, Sun BluePrints, and refereed papers on dependable systems and performability. He is a regular contributor to the ZFS community and is currently the Director of Solution Engineering for Nexenta Systems.

1:30 pm5:00 pm

Thurgood Marshall East Ballroom

Linux systems offer a staggering number of options for file systems. With a bit of work, you can get pretty much any legacy file system to be recognized. However, for current production needs, only a very few of those many, many choices actually matter: Ext4, Btrfs, and ZFS. This course will discuss these three file systems in gory detail and present the results of a variety of performance tests, run/supervised by the presenter, which were designed to test the file systems in real computing operations (as opposed to benchmark scenarios).

Who should attend:

System administrators who wish to understand the structure and performance of modern file systems from an in-use point of view. This is not a course for kernel hackers, though; people wanting an in-depth look at the VFS code will not have their needs met.

Take back to work:

An understanding of the benefits of and differences between the three file systems under a variety of realistic scenarios.

Topics include:
  • Virtual file system (VFS) overview
  • Ext4 structure and goals
  • Btrfs structure and goals
  • ZFS structure and goals
  • Performance results
Additional Materials:

Michele Chubirka, Packetpushers

Michele Chubirka, aka "Mrs. Y.," is a recovering UNIX engineer working in network security. She is also the host of the Healthy Paranoia podcast, the security feed of Packetpushers, and official nerd hunter. She likes long walks in hubsites, traveling to security conferences, and spending time in the Bat Cave. She sincerely believes that every problem can be solved with a "for" loop. When not blogging or podcasting, she can be found using up her 15 minutes in the Twittersphere or Google+ as @MrsYisWhy.

Joe Weston, Consultant and founder of Respectful Confrontation method

Joe Weston is a workshop facilitator, consultant, and author of the book Mastering Respectful Confrontation. He is also the founder of the Heartwalker Peace Project, which creates opportunities for connection, discussion, and creative collaboration.

Soft Skills
S9
A New Model: Advancing Organizational Security Through Peacebuilding
New!
Michele Chubirka, Packetpushers

Michele Chubirka, aka "Mrs. Y.," is a recovering UNIX engineer working in network security. She is also the host of the Healthy Paranoia podcast, the security feed of Packetpushers, and official nerd hunter. She likes long walks in hubsites, traveling to security conferences, and spending time in the Bat Cave. She sincerely believes that every problem can be solved with a "for" loop. When not blogging or podcasting, she can be found using up her 15 minutes in the Twittersphere or Google+ as @MrsYisWhy.

 
Joe Weston, Consultant and founder of Respectful Confrontation method

Joe Weston is a workshop facilitator, consultant, and author of the book Mastering Respectful Confrontation. He is also the founder of the Heartwalker Peace Project, which creates opportunities for connection, discussion, and creative collaboration.

1:30 pm5:00 pm

Wilson C Room

Why is the security industry so full of fail? We spend millions of dollars on firewalls, IPS, IDS, DLP, professional penetration tests and assessments, and vulnerability and compliance tools, and at the end of the day, the weakest link is the user and his or her inability to make the right choices. It's enough to make a security engineer cry.

The one thing you can depend upon in an enterprise is that many of your users, even with training, will still make the wrong choices. They will violate BYOD restrictions, click on links they shouldn't, respond to phishing scams, open documents without thinking, post too much information on Twitter and Facebook, use their pet's name as passwords, etc. But what if this isn't because users hate us or are too stupid? What if all our ignored policies and procedures regarding the best security practices have more to do with our failure to understand modern neuroscience and the human mind's resistance to change?

Humans are wired to be emotional beings. Emotions influence most of our decisions, good and bad. In failing to understand how this is at the root of user non-compliance, no matter how much money we spend on expensive hardware and software, we will fail to achieve the goal of good organizational security.

Attendees will have the opportunity to practice various communication methods and conflict resolution skills that will improve their ability to motivate change in their organizations.

Who should attend:

Sysadmins and security professionals who need tools to improve user security compliance in their organizations.

Take back to work:

Critical mind hacks for improving your professional lives as well as your relationships with others, achieved through "hacking the human," not by software or hardware.

Topics include:
  • The affective neuroscience behind risky behavior
  • Collaborative communication methods and conflict resolution skills to crowdsource the goal of organizational security
Additional Materials:

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

Linux
S10
Build a Sysadmin Sandbox
New!
David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

1:30 pm5:00 pm

Thurgood Marshall West Ballroom

Every sysadmin I know has at least four new technologies they would like to play with—and that's just before breakfast. But finding the time to do this can be difficult, often because of the effort it takes to find a machine, set up the network connection, firewall it from the other machines, and so on... If you want to try something that requires multiple components, that's even worse.

What you need is a sandbox: a place where you can experiment with new technologies, mess with cool tools, test the latest versions of software, prototype new services, and so on, without any sort of heavyweight effort.

Who should attend:

Sysadmins who want to test technologies safely and efficiently.

Take back to work:

The ability to build your own sysadmin sandboxes.

Topics include:
  • Using software and services like Git, Vagrant, and Amazon EC2 to make sandbox construction easy and free/inexpensive
  • When a sandbox approach is and is not appropriate
  • Migrating your work in and out of sandboxes
  • How to share the work you've done in a sandbox with others
Additional Materials:
Monday, November 4, 2013

Full Day

Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

Cloud System Administration
M1
Using and Migrating to IPv6
Updated!
Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

9:00 am5:00 pm

Wilson C Room

The Internet is facing an imminent exhaustion of IP addresses. IPv6, the next-generation Internet Protocol, is designed to address this problem, among other things. If you have not yet started to deploy IPv6, now is the time. The "World IPv6 Launch" event took place in June 2012, and major industry players such as Google, Facebook, Comcast, Yahoo!, Akamai, and Cisco already support IPv6, with many others coming on board in the near future. You need to be next!

This tutorial will provide a detailed introduction to IPv6 and will also cover practical aspects of configuring and using IPv6 in networks, operating systems, and applications. Basic knowledge of IPv4 is assumed, but no prior knowledge of IPv6 is necessary. The course will go over the basics of IPv6 and dive into the details you need.

Who should attend:

System administrators, network administrators, and application developers who need to prepare for migration to IPv6, and anyone who wants a general introduction to IPv6 and what is involved in deploying it.

Take back to work:

An understanding of IPv6, with the basic knowledge to begin designing and deploying IPv6 networks, systems, and applications.

Topics include:
  • The current IP address landscape and the motivation for IPv6
  • IPv6 addresses and protocol details
  • DHCPv6
  • DNS
  • Tunneling
  • Configuring hosts and application services
  • IPv6-related security topics
Additional Materials:

Branson Matheson, SGT

Branson is a 25-year veteran of system administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines, while continuing to support many open source projects. He founded sandSecurity to provide policy and technical audits, plus support and training for IT security, system administrators, and developers. Branson currently is a systems architect for NASA; has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

Linux
M2
Hands-on Security for System Administrators
New!
Branson Matheson, SGT

Branson is a 25-year veteran of system administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines, while continuing to support many open source projects. He founded sandSecurity to provide policy and technical audits, plus support and training for IT security, system administrators, and developers. Branson currently is a systems architect for NASA; has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

9:00 am5:00 pm

Thurgood Marshall West Ballroom

System administrators often run into interesting conflicts between sysadmin, customer, and IT security needs. These conflicts generally lead to difficulty in achieving a balance between administrative convenience, good customer service, and minimal risk. There are processes or services available that can significantly improve any of these areas; however, many times they are costly or resource intensive. This course is designed for system administrators who want to improve the security posture of their organizations, using IT security standards paired with good system administration practices, resulting in better service, lower risk, and minimal impact to time and budget.

We will walk a path, covering many domains of IT security and enterprise system administration in ways that utilize the interconnection between good system administration practices and security fundamentals. We will discuss recent risks and threats, analyze them with respect to your environment, review possible impacts and develop mitigations that you can apply immediately. Training includes instruction, discussion, many hands-on labs, and a strong dose of common sense.

Attendees should bring a laptop capable of running a Virtual Guest and will be provided a VM in which to work. The class will have access to a test network of systems for training and lab exercises. You will return to your organization with a toolbox of documentation, (mostly) free software, and a good starting point for developing better practices to improve security and system administration.

Who should attend:

Beginning to mid-level system administrators of any stripe with an interest in IT security and a desire to improve their security. It is suggested that participants have experience with the  *nix command line.

Take back to work:

Documentation, tips, tricks, and tools tailored to your environment that can be implemented to improve security posture, processes, and operations in your organization.

Topics include:
  • The relationship between system administration and IT security
  • Security theories, standards and risk mitigation as applied by SA's
  • Information management using Trac and Subversion
  • Good system administration practices that directly improve IT security
  • Basic configuration driven system management using Puppet
  • Host and network auditing, hardening, and monitoring
  • Developing an effective security awareness program
Additional Materials:

Half Day Morning

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Coding
M3
Core Skills: Scripting for Automation
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am12:30 pm

Thurgood Marshall North Ballroom

Automation is critical to surviving your system administration career with your sanity, hair, and systems intact. If you can automate some or all of a task, then you stand to make considerable gains in personal productivity, task repeatability, and system predictability.

So how can you achieve this state of nirvana? The answer is scripting.

This class is a practical crash course in how—using a combination of bash, Perl, and friends—you can write useful scripts that solve real-world system administration problems.

Please note that this is a hands-on class. A basic understanding of programming ("What's a loop?") and how to edit files in your favorite flavor of •nix are assumed. Attendees will need to bring a laptop with OS X, Linux, or FreeBSD installed to complete in-class tasks.

If there is sufficient interest, we will schedule a BoF for scripting challenges, where we can work them out interactively as a group.

Who should attend:

Junior and intermediate sysadmins who are new to scripting or would like to create scripts to reliably automate sysadmin tasks.

Topics include:

An understanding of how to apply standard utilities in your scripts, along with recipes for automating typical administration tasks.

Additional Materials:

David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

DevOps
M4
Building a Big IaaS Cloud: Building a Massively Scalable, Geographically Disparate, IaaS Cloud Using Apache CloudStack
David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

9:00 am12:30 pm

Thurgood Marshall South Ballroom

In this half-day class, we'll give system administrators an understanding of IaaS clouds, with the reference implementation being Apache CloudStack. We'll cover everything from initial configuration to scaling and maintaining availability.

Who should attend:

Intermediate to advanced sysadmins or enterprise architects wanting to deploy a production Infrastructure as a Service cloud. Experience with virtualization platforms and a deep understanding of L2/L3 networking are preferred but not required.

Take back to work:

What you need to deploy an IaaS cloud, based on Apache CloudStack, in an enterprise environment.

Topics include:
  • Deploying Apache CloudStack
  • Next-generation network topologies, including SDN
  • Scaling storage without becoming indentured to SAN vendors
  • Making CloudStack scale to tens of thousands of physical nodes
  • Maintaining availability in a "failure-is-assured" environment
Additional Materials:

Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

Linux
M5
How to Interview a System Administrator
Updated!
Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

9:00 am12:30 pm

Harding Room

Do you know how to interview a system administrator? Do the questions you ask elicit specific, narrowly-focused information, or do they show you both the depth and breadth of a candidate's knowledge of a particular subject or technology? Do you know how to distinguish between a candidate who is just trying to bluff through the interview and one who has some knowledge of the field but hasn't yet become an expert? Are trick questions ever appropriate, and if so, when and why? Some questions shouldn't be asked, and some would even land you in hot water with your company's HR or legal department: do you know what those questions are? Finally, have you figured out how to help a candidate do well in an interview while still getting an objective and fair assessment of their skills?

If you answered "no" or even "I'm not sure" to any of these questions, this course is for you.

This tutorial will focus on techniques for interviewers, but even sysadmins who are just starting out will learn some things to use as an interviewee. Managers of system administrators and junior sysadmins will learn, among other things, how to interview someone who knows more than you do. Junior administrators will also learn how to respond (as an interviewee) when asked a bad question, in particular, how to turn it into a better question.

Who should attend:

System administrators of all levels of experience, as well as managers of system administrators.

Take back to work:

Increased confidence in your ability to weed out the posers and hire really great sysadmins.

Topics include:
  • Purposes of an interview
    • To assess the candidate's technical skills
    • To get a feel for the candidate's personality and interpersonal skills
    • To learn whether a candidate is likely to be a good fit with the company and with the IT group
    • To help the candidate figure out whether he wants this job and whether he is likely to do well in the position
    • Maybe even to teach the candidate something new about system administration
  • Basic questions to bear in mind
    • Is the candidate comfortable?
    • Does he need a drink or a bathroom break?
    • Does she know who you are and what your role in the company is?
  • Preparatory questions
    • What are you really trying to learn about the candidate's skills, and why?
    • What makes a good question good?
    • What makes a bad question bad?
    • How can you turn bad questions into good ones?
    • When is it appropriate to ask a trick question, and why?
    • What questions can't or shouldn't you ask?
Additional Materials:

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

Soft Skills
M6
Navigating the Business World for Sysadmins: The Trusted Adviser
New!
Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

 
Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

9:00 am12:30 pm

Thurgood Marshall East Ballroom

This tutorial provides guidelines and suggestions to improve the communication and collaboration of sysadmins and directly increase effectiveness when working with senior management.

This course is intended to provide you with skills that are associated with senior IT staff. The focus is on understanding the gap between IT operations and management and how to bridge that gap to increase respect for IT and improve your relationship with management. We will cover identifying your role within the organization, effective communication with upper management, positioning yourself to increase your visibility, and becoming the go-to person/group for IT within your organization.

Who should attend:

Sysadmins who want to learn specific skills that allow them to work more effectively with senior management within the organization.

Take back to work:

The ability to identify the management team relevant to IT in your organization; to plan branding, both individually and for your team; to compose effective emails that get results; to communicate effectively IT priorities and goals to management; to improve communication within the IT group; to translate business goals into IT priorities; to work collaboratively with management; to plan and conduct effective and efficient meetings; to return to work and start building that trusted IT adviser relationship.

Topics include:
  • Examining the divide between management and IT
  • Defining "management" within your organization
  • How to communicate effectively between management and IT operations
  • Demonstrating the value of your work in a way that management will understand
  • Ways to communicate the benefits of supporting a strong IT presence
  • Knowledge and information management
  • How to develop a collaborative relationship with your management that enables both sides to be successful
Additional Materials:

Half Day Afternoon

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Linux
M7
Advanced Shell Programming
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm5:00 pm

Thurgood Marshall North Ballroom

The humble shell script is still a mainstay of UNIX/Linux system administration scripting languages, despite the wide availability of alternatives such as Perl, Python, TCL, and other similar languages. This class details techniques that move beyond the quick-and-dirty shell script.

Who should attend:

Intermediate system administrators or anyone with a solid knowledge of programming and with some experience in Bourne/Korn shells or their derivatives.

Take back to work:

An understanding of how to use the "lowly" shell to achieve lofty goals.

Topics include:
  • Common mistakes and unsafe practices
  • Modular shell script programming
  • Building blocks: awk, sed, etc.
  • Writing secure shell scripts
  • Performance tuning
  • Choosing the right utilities for the job
  • Addressing portability at the design stage
  • When not to use shell scripts
Additional Materials:

David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

Chiradeep Vittal, Citrix Systems

Chiradeep Vittal is Principal Architect in the Cloud Platforms Group at Citrix Systems. He is a maintainer in the Apache CloudStack project where he contributes to networking and storage parts of the Infrastructure-as-a-Service (IAAS) management system. He was a founding engineer at Cloud.com whose product CloudStack is now incubating at the Apache Software Foundation. CloudStack is deployed in more than 100 public and private clouds and powers some of the largest clouds in the world today.

Cloud System Administration
M8
Networking in the Cloud Age
David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

 
Chiradeep Vittal, Citrix Systems

Chiradeep Vittal is Principal Architect in the Cloud Platforms Group at Citrix Systems. He is a maintainer in the Apache CloudStack project where he contributes to networking and storage parts of the Infrastructure-as-a-Service (IAAS) management system. He was a founding engineer at Cloud.com whose product CloudStack is now incubating at the Apache Software Foundation. CloudStack is deployed in more than 100 public and private clouds and powers some of the largest clouds in the world today.

1:30 pm5:00 pm

Thurgood Marshall South Ballroom

Networking has been relatively static for decades. We've seen increases in speed, but many of the traditional topologies are inherently limited. Innovative networks are quite different. If you look at public services, such as AWS, or large private cloud deployments, you see that their networking topology looks contrary to everything that's been standard for years. In this half-day class we'll reexamine what limitations there are and what innovative options exist to remove those limitations.

Who should attend:

Advanced system or network admins with a deep understanding of L2/L3 networking who want to learn about new networking technologies that are enabling scaling networks.

Take back to work:

Knowledge of emerging networking standards and where they are best used.

Topics include:
  • How massive public and private clouds build their networks to ensure scalability
  • How software defined networks work
  • Technologies worth looking at: VXLAN, NVGRE, GRE
Additional Materials:

Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

Soft Skills
M9
Getting to Senior: Sysadmin Level IV and Beyond
New!
Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

1:30 pm5:00 pm

Harding Room

The USENIX Short Topics in System Administration book Job Descriptions for System Administrators is a fine work, and one might think that it could be used as a roadmap by sysadmins looking to advance their careers. However, the book limits itself to the words that might appear in a job posting, and is necessarily concise. Furthermore, the book emphasizes specific technical skills without much emphasis on the myriad "soft" skills a senior admin needs. A good senior system administrator needs to know more than just what's in the job description, and needs to know why these additional skills are required.

So what's a junior or mid-level sysadmin to do? Why, take this tutorial, of course!

Adam will go far beyond just reading the requirements listed in the book by providing an analysis of what is meant by the requirements and why a broad range of skills is often necessary to meet a single requirement. "Hard" skills will be balanced with "soft" skills such as professional ethics, job performance, and conduct in the workplace. To some attendees' dismay, Adam will also explain why a senior system administrator needs to be good at talking to both the least skilled users and the highest executives, and why being able to talk to these people is even more important for the most senior administrators than for the most junior.

Who should attend:

Junior and mid-level sysadmins who want advice and direction on how to advance their careers.

Take back to work:

The skills you need to learn, as well as an understanding of why you need to learn them and why many of those skills have more to do with business and management than system administration; ideas for how to go about learning what you need beyond this course.

Topics include:
  • "Hard" skills, "soft" skills
  • The LISA Job Description Levels
    • Working definitions
    • Aim of this talk in terms of these definitions
  • "Hard" skills
    • "System" knowledge
      • User commands
      • Administrative commands
      • Boot process, run levels, and init files
      • Technologies (RAID, back-ups, etc.)
        • Theory
        • Implementation
      • Configuration management (systems)
      • Networking
        • Hardware
        • Protocols
        • Applications
      • The kernel
        • Layout/data structures
        • Filesystems
        • Devices
    • Programming skills
      • Shell (and awk and sed and...)
      • Perl
      • Python, Ruby, and others
      • C and maybe assembler
      • Basic S/W engineering
        • Revision control
        • Configuration management (software)
      • Knowing when to get help
    • Standards
      • What's on your bookshelf
      • "Understanding the system"
  • "Soft" skills
    • The real role of a sysadmin
    • Understand business
    • Attitude and professionalism
      • Interacting with customers
    • Professional growth
      • Learning
      • Getting help
      • Conferences and training
  • Going beyond Sysadmin Level IV
    • "Slightly squishy" (not hard, not soft—or both)
    • Project management
    • System design and analysis
      • Technical
      • Business (cost/benefit)
    • Budgeting
    • More interacting with customers
    • Interacting with management
    • Professional growth (again)
Additional Materials:

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

Soft Skills
M10
Navigating the Business World for Sysadmins: Methods
New!
Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

 
Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

1:30 pm5:00 pm

Thurgood Marshall East Ballroom

This tutorial provides guidelines and suggestions to support core business processes and directly increase effectiveness when working with senior management.

Who should attend:

Sysadmins who want to learn specific skills that allow them to work more effectively with senior management within the organization. This course is intended to provide you with skills that are associated with senior IT staff.

Take back to work:

The ability to write a basic capital proposal, including net present value analysis; to draft a budget, complete with impacts; to understand how to sell a technical concept to management and other non-techies; to prepare basic rationale for a technical capability with reference to a specific organizational goal; and to understand how metrics are used by management, as well as identify and calculate relevant metrics.

Topics include:
  • Navigating the capital expenditure process: understanding the capital cycle, how to write a capital proposal, and basic net present value analysis
  • Budgeting 101: the importance of the budget, how to draft a budget, selling your budget to management, and the importance of including impacts
  • How to sell an IT concept to non-IT people (e.g., agile development, DevOps, cloud computing)
  • Positioning IT as a valuable asset to the organization: understanding your organization's core business and linking the contribution of IT to that business (e.g., if security is a primary focus for the company, ensure that IT is protecting the organization's critical assets)
  • Untangling IT metrics: the importance of metrics, selecting the right metric, developing your own, and using metrics effectively
  • Managing resources: projecting resource allocation within IT, prioritization and strategic planning, and big-picture view of IT's contributions
Additional Materials:
Tuesday, November 5, 2013

Full Day

Guido Trotter, Google

Guido Trotter a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

Helga Velroyen, Google

Helga has been a Google Software Engineer since 2011 and a member of the Ganeti core team since 2012. Inside Ganeti she implemented better network support, enhancements of storage handling, and various bug fixes. In her life before Google she worked at a biotech company, bringing medical analysis robots to life. She is a member of the German Chaos Computer Club and has given talks at their conferences, including one at 28c3 about hearing aid technology and one at the OpenChaos Cologne about C#/.Net. At university her focus was on data-mining and pattern recognition; she participated successfully in various data-mining competitions.

Cloud System Administration
T1
Ganeti: Your Private Virtualization Cloud "the Way Google Does It"
Updated!
Guido Trotter, Google

Guido Trotter a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

 
Helga Velroyen, Google

Helga has been a Google Software Engineer since 2011 and a member of the Ganeti core team since 2012. Inside Ganeti she implemented better network support, enhancements of storage handling, and various bug fixes. In her life before Google she worked at a biotech company, bringing medical analysis robots to life. She is a member of the German Chaos Computer Club and has given talks at their conferences, including one at 28c3 about hearing aid technology and one at the OpenChaos Cologne about C#/.Net. At university her focus was on data-mining and pattern recognition; she participated successfully in various data-mining competitions.

9:00 am5:00 pm

Thurgood Marshall South Ballroom

Ganeti is a cluster virtualization system developed mostly at Google but used by many organizations worldwide. Businesses and groups can leverage the power of Ganeti to easily and effectively manage fleets of physical hosts and use them to schedule virtual machine guests.

Reasons for choosing Ganeti include that it is very lightweight, it is simple to install and manage, and it doesn't demand special storage hardware.

Who should attend:

System engineers interested in using virtualization and cloud technologies efficiently to consolidate systems and decouple physical hardware resources from virtual systems. Ideal participants are proficient with Linux/UNIX system administration and may already be using some virtualization technologies, but want to achieve a higher level of scalability for their systems by employing a cluster management technology such as Ganeti, without the need to invest money in specialized hardware resources such as SANs.

Take back to work:

The knowledge needed to create and maintain your own Ganeti cluster, to provide an IaaS cloud or virtualized services.

Topics include:
  • Setting up and managing a Ganeti cluster
  • Ganeti internals: how to make changes
  • Monitoring your cluster and dealing with failure
  • Ganeti as a back end
  • Typical and atypical use cases
Additional Materials:

Half Day Morning

Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

Soft Skills
T2
The First Hundred Days
Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

9:00 am12:30 pm

Wilson C Room

Starting at a new company gives you exactly one chance to do things differently, to establish a new approach, realign the team's efforts, and change the tenor of the discussion. To be effective in your new role as leader, you must establish a rapport with the other departments and with your team and take control of the work pipeline.

You've made the leap. You're about to start at a new company, maybe as a senior sysadmin, maybe as a team leader or manager. Now you're asking yourself, "What do I do first?"

Moving to a different company or being promoted internally gives us a unique opportunity to put aside the history of a site and take a new look with a clean set of eyes. If you're hired as the new team lead or manager, then knowing how you're going to get to know the new site and how you're going to get on top of any site consolidation activities is critical to your longevity in the role.

This class discusses the various aspects of moving from a tactical (bottom-up) view of system administration to a strategic (top-down) view. We cover the initial site survey, the first steps of identifying and cauterizing open wounds, and the process of systemic review and improvement.

Who should attend:

Anyone starting or contemplating a new position, including making an in-house move that enables you to start over; anyone with a new boss who wants to understand and help that boss; anyone about to apply for a senior position who wants to take control of the interview process.

Take back to work:

A set of tools and perspectives that will help you evaluate an environment, from company structure and pain points to IT systems and team skills, and help you engage a team in improvements to the department.

Topics include:

Part 1: A topical view

  • Organizational awareness: The boss, the company
  • The team: Assessing the team; first repairs
  • Building the work pipeline; second repairs
  • Systems and processes: Workflow management, change management, event management
  • Round 1: Cauterizing the wound
  • Round 2: Some early wins
  • Round 3: The Big Three
  • Systemic improvement programs

Part 2: A temporal view

  • The first day
  • The first week
  • The first month
  • The first hundred days
Additional Materials:

Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

Linux
T3
Next-Generation Storage Networking: Beyond Conventional SAN and NAS
Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

9:00 am12:30 pm

Harding Room

There has been tremendous innovation in the data storage industry over the past few years.  New storage architectures have come to market to challenge traditional SAN and NAS products with nimble new designs that are much better suited to serving the increasingly virtual nature of applications and server infrastructure. Meanwhile, the allure of cloud computing and the emergence of affordable enterprise-class solid state storage devices have inspired ever more innovative approaches to storage caching, tiering, and deduplication. This lecture is a survey of the latest trends and advances in the data storage industry. We trace the I/O path from application to storage media and look at a wide variety of solutions to the ever-changing challenges of data storage.

Who should attend:

System administrators running day-to-day operations, enterprise architects, storage administrators. This tutorial is technical in nature, but it does not address command-line syntax or the operation of specific products or technologies. Rather, the focus is on general architectures different ways to tackle various storage management challenges.

Take back to work:

A better understanding of modern storage architectures, various approaches to scaling in both performance and capacity, and a framework for comparing and contrasting various types of storage solutions.

Topics include:
  • The storage I/O path and the fundamentals of storage virtualization
  • Application acceleration with solid state storage devices (SSDs)
  • Automated tiered storage and information life cycle management (ILM)  
  • Deduplication of primary storage 
  • Object storage models and content-addressable storage 
  • Leveraging the cloud for primary storage
Additional Materials:

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

Linux
T4
Recovering from Linux Hard Drive Disasters
Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am12:30 pm

Thurgood Marshall North Ballroom

Ever had a hard drive fail? Ever kick yourself because you didn't keep backups of critical files, or you discovered that your regularly nightly backup didn't succeed?

Of course not: everybody keeps regular backups and verifies them to make sure they are successful. But for those people who think they might nevertheless someday need this information, this tutorial will discuss ways of recovering from storage disasters caused by failures somewhere in the hardware or software stack.

Who should attend:

Linux system administrators and users.

Take back to work:

How to recover from storage disasters caused by failures somewhere in the hardware or software stack.

Topics include:
  • How data is stored on hard drives
  • Recovering from a corrupted partition table
  • Recovering from failed software RAID systems
  • Low-level techniques to recover data from a corrupted ext2/3/4 filesystem when backups aren't available
  • Using e2image to back up critical ext2/3/4 filesystem metadata
  • Using e2fsck and debugfs to sift through a corrupted filesystem
  • Preventive measures to avoid needing to use heroic measures
Additional Materials:

Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

Soft Skills
T5
Advanced Time Management: Team Efficiency
Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

9:00 am12:30 pm

Thurgood Marshall West Ballroom

Attendees must bring a laptop for in-class exercises. Techniques will apply to any wiki or collaborative document system, but the labs and demos will be done using Google Apps.

Who should attend:

All sysadmins who want to collaborate efficiently within their team and with others (even solo sysadmins will benefit!).

Take back to work:

Techniques to help your IT team work better, faster, and more transparently.

Topics include:
  • Collaborate online with collaborative documents, shopping lists, and Kanban tools.
  • Go from "firefighting" to working on more interesting projects
  • Create meetings that don't suck
  • Hold team members accountable for tasks they take on
  • Cross train team members more effectively with fire drills (DevOps "Game Day" exercises)
  • Train people on new services more effectively
  • Eliminate problems due to inconsistent processes
  • Send more effective emails
  • Launch new services without "oops"
  • Document how things work so everyone can go on vacation
Additional Materials:

Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

DevOps
T6
Continuous Integration with Jenkins
New!
Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

9:00 am12:30 pm

Thurgood Marshall East Ballroom

Does your software development project need a butler? Is your development team guilty of writing great code while only sporadically testing it? Does pushing your code-base to production make your programmers call in sick? Enter Continuous Integration with Jenkins. Learn how this open source project can improve your development by consistently and continuously automating your software project tasks.

Who should attend:

Junior and intermediate DevOps administrators in need of sanity when testing and deploying software.

Take back to work:

Real-world usage of Jenkins for better automation of testing and deployment with your team's software project.

Topics include:
  • Continuous Integration concepts and best practices
  • Jenkins introduction and configuration
  • Jobs, job history, and build artifacts
  • SCM integration and job triggering
  • Jenkins cluster farming: multi-node management made easy
  • Authentication considerations
  • Jenkins plugins
  • Distributed builds for heterogeneous architectures
  • Build/test/deploy pipelines
Additional Materials:

Half Day Afternoon

Jeanne Schock, Afilias

Jeanne has a background in system administration, working with FreeBSD, Solaris, Linux, and Windows. She has worked at a number of companies in the Internet industry, including a regional ISP, a video hosting company, and, for the past six years, Afilias, a Registry and DNS service provider. In her current role of Service Delivery Manager at Afilias, she is owner and manager for key ITIL processes including continual service improvement, release and deployment management, change management, problem management, and incident management. She also manages projects geared toward site reliability and disaster recovery. Jeanne got her BA from Vassar College and did graduate work at the University of Toronto in Medieval European studies, teaching Latin in her pre-IT years.

T7
Disaster Recovery Plans: Design, Implementation and Maintenance Using the ITIL Framework
New!
Jeanne Schock, Afilias

Jeanne has a background in system administration, working with FreeBSD, Solaris, Linux, and Windows. She has worked at a number of companies in the Internet industry, including a regional ISP, a video hosting company, and, for the past six years, Afilias, a Registry and DNS service provider. In her current role of Service Delivery Manager at Afilias, she is owner and manager for key ITIL processes including continual service improvement, release and deployment management, change management, problem management, and incident management. She also manages projects geared toward site reliability and disaster recovery. Jeanne got her BA from Vassar College and did graduate work at the University of Toronto in Medieval European studies, teaching Latin in her pre-IT years.

1:30 pm5:00 pm

Wilson C Room

The only good disaster recovery plan is the one that actually works when you need it. How do you ensure readiness? How do you know that your plan is aligned with the business objectives of your company? Are you having trouble getting buy-in from management to fund your design? How do you prevent both your disaster plan and design from being out of date just weeks after implementation? This tutorial will provide a step-by-step framework for implementing a DR project through the ITIL Lifecycle approach to IT Service Continuity Management. I will demonstrate how to create a process for ongoing management of your disaster recovery capabilities and to ensure that you are proactively improving your plan and design.

Who should attend:

System administrators and managers who are responsible for disaster planning and ensuring that the plan is ready when disaster strikes, whether you have a current strategy in place or are starting from scratch.

Take back to work:

A step-by-step framework for designing and implementing your DR strategy, and for making sure that your plan is ready when you need it.

Topics include:
  • Service Continuity Management: Introduction
    • What's a process
    • What are the benefits for DR
    • Every process should have an explicit, documented purpose
    • What you need to start and how it fits into actually implementing DR
  • Lifecycle approach to IT Service Continuity Management
    • Initiation: establish management intention, define policies, define scope, agree on objectives, initiate the project
    • Requirements and strategy
    • Implementation
    • Ongoing operation and maintenance
  • Ensuring ongoing support as well as continual improvement
    • Embed an element of continual improvement to ensure that you are always looking at ways to improve technologies, procedures, capabilities, costs
    • Process considerations
    • Service Continuity and related processes: availability, capacity, and IT security management
Additional Materials:

Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

T8
Backups, Archiving, and Life Cycle Management: Riding the Wave of Data Proliferation
Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

1:30 pm5:00 pm

Harding Room

Most IT organizations report exponential data growth over time, and whether your data doubles every year, every two years, or every five years, the simple fact remains that if your data capacities double, then both the capacity and the performance of your backup system must double. All of this doubling stresses traditional approaches to data management. Thus, it is no surprise that backup/recovery is one of the most costly and unforgiving operations in the data center. Meanwhile, most IT organizations also report that the vast majority of their unstructured data is seldom or never accessed. Files accumulate year after year, choking the backup systems and driving up costs.

This course explores two main ways to manage the data deluge: (1) optimize backup systems by eliminating bottlenecks, streamlining operations, and bulking up backup infrastructure; and (2) manage the life cycles of unstructured data so that files that are not in active use can be managed separately from files that are in active use. We start by offering a simple framework for defining business requirements and comparing solutions at a high level. We then delve into the various mechanisms for lifecycle management and for eliminating backup system bottlenecks. Some time is spent exploring storage systems that have built-in mechanisms for data protection and lifecycle management.

Who should attend:

System administrators involved in the design and management of backup systems and policymakers responsible for protecting their organization's data.

Take back to work:

Ideas for immediate, effective, inexpensive improvements to your backup systems and a vision for how you might deploy a lifecycle management system that fits your organization.

Topics include:
  • Formulating strategies for data protection and lifecycle management
  • Identifying and addressing backup system bottlenecks
  • Managing fixed content
  • Hierarchical storage management and data migration
  • In-band versus out-of-band approaches to file lifecycle management
  • Breathing new life into tape storage
  • Deduplication: separating hype from reality
  • Object-based storage models for backup and archiving
  • Self-healing and self-protecting storage systems
  • Leveraging the cloud for backup and archiving
Additional Materials:

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

T9
Documentation Techniques for Sysadmins
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm5:00 pm

Thurgood Marshall North Ballroom

Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

Who should attend:

System administrators who need to produce documentation for the systems they manage.

Take back to work:

The ability to make immediate, practical use of these documentation techniques.

Topics include:
  • Why system administrators need to document
  • The document life cycle
  • Targeting your audience
  • An adaptable document framework
  • Common mistakes in documenting
  • Tools to assist the documentation process
Additional Materials:

Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

Soft Skills
T10
Evil Genius 101
New!
Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

1:30 pm5:00 pm

Thurgood Marshall West Ballroom

You want to innovate: deploy new technologies such as configuration management (CFEngine, Puppet, Chef), a wiki, or standardized configurations. Your coworkers don't want change. They like it the way things are. Therefore, they consider you evil. However you aren't evil; you just want to make things better.

Who should attend:

Anyone who wants to improve processes and learn about managing change.

Take back to work:

The communication, analysis, and persuasion skills you need to make your workplace better.

Topics include:
  • Helping your coworkers understand and agree to your awesome ideas
  • Convincing your manager about anything—really
  • Turning the most stubborn user into your biggest fan
  • Getting others to trust you so they are more easily convinced
  • Deciding which projects to do when you have more projects than time
  • Making decisions based on data and evidence
  • Driving improvements based on a methodology and planning instead of guessing and luck
Additional Materials:

Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

Cloud System Administration
T11
High-Availability Linux Clustering
Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

1:30 pm5:00 pm

Thurgood Marshall East Ballroom

We are evolving from a single service running on a single server, complete with multiple points of hardware and software failure... but to what? With services and platforms that need to be restored before our pagers or Bat Phones ring, the system administrators of today have a need for high availability. Starting with the current realities of modern data centers, this half-day tutorial will explore practical uses of Linux clusters.

Who should attend:

Linux administrators who are planning on implementing a multi-service fail-over cluster implementation in a production environment. Course attendees should be familiar with the basics of system administration in a Linux environment. At no point will the word "cloud" be used, although novice administrators and gurus alike should leave the tutorial having learned something.

Take back to work:

The knowledge and ability to create and administer highly available services and filesystems on a Linux cluster.

Topics include:
  • Linux HA Cluster technology: Corosync, OpenAIS, rgmanager, Conga
  • Data management with shared disk implementations: SAN, iSCSI, AoE, FCoE
  • Node fencing with STONITH
  • Network power switches and IPMI
  • Clustered logical volume management
  • GFS2 filesystems with Distributed Lock Manager (DLM)
  • Service management with failover domains
  • Virtual machines as a cluster service
  • Cluster administration with luci
  • Working with cluster-unaware services
Additional Materials:
Wednesday, November 6, 2013

Full Day

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

Linux
W1
Linux Performance Tuning
Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am5:00 pm

Lincoln 4 Room

The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workloads on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:
  • Strategies for performance tuning
  • Characterizing your workload's requirements
  • Finding bottlenecks
  • Tools for measuring system performance
  • Memory usage tuning
  • Filesystem and storage tuning
  • NFS performance tuning
  • Network tuning
  • Latency vs. throughput
  • Capacity planning
  • Profiling
  • Memory cache and TLB tuning
  • Application tuning strategies
Additional Materials:

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

Linux
W2
Root Cause Analysis—Beginner
Updated!
Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

9:00 am5:00 pm

Hoover Room

This version of the class is aimed at the mid-level sysadmin. You manage servers and/or network gear, look at packet traces, poke through logs—but wouldn’t consider yourself an expert at any of this. You want a chance to tackle the problem on your own, then want guided practice on technique: analyzing a packet trace for performance problems, extracting insights from trending charts, correlating log entries from multiple devices. In this version of the class, we spend time together reviewing concepts (e.g., caching and spindles), applying techniques (e.g., Wireshark features), asking questions (e.g., TCP, SMB, and NFS). In addition to the technical contributors, each team will need a problem manager—perhaps a senior engineer, perhaps a resource or project manager comfortable with coordinating teams of techs.

Troubleshooting is hard. In hindsight, the answer to a problem is often obvious, but in the chaos and confusion of the moment—with too much data flowing in, time pressure, misleading clues—slicing through the distractions and focusing on the key elements is tough. This is a hands-on seminar: you will work through case studies taken from real-world situations. We divide into groups of 5–7, review a simplified version of Advance7′s Rapid Problem Resolution (RPR) methodology, and then oscillate on a half-hour cycle between coming together as a class and splitting into groups. During class time, I will describe the scenario, explain the current RPR step, and offer to role-play key actors. During group time, I will walk around, coaching and answering questions.

The course material includes log extracts, packet traces, strace output, network diagrams, Cacti snapshots, and vendor tech support responses, all taken from actual RCA efforts. Preview the deck to get a feel for how your day will look. BYOL (Bring Your Own Laptop) for some hands-on, interactive, team-oriented, real-world puzzle solving.

Who should attend:

System administrators and network engineers tasked with troubleshooting multidisciplinary problems; problem managers and problem analysts wanting experience coordinating teams.

Take back to work:

Practice in employing a structured approach to analyzing problems that span multiple technology spaces.

Topics include:

Case studies:

  • Remote Office Bumps: A remote office ties back to the campus via a 10MB circuit. Intermittently, opening documents on the campus-based file-server is slow, printing is slow, Exchange appointments vanish…
  • Many Applications Crash: Outlook crashes, Word documents fail to save, Windows Explorer hangs: The office automation applications servicing ~1500 users intermittently report a range of error messages; users reboot their machines. Some days are fine, other days are terrible, and the symptoms are worsening…
Additional Materials:

Nathen Harvey, Opscode

Nathen Harvey is a Technical Community Manager at Opscode, the company behind Chef. Nathen is the co-organizer of DevOps DC and the Washington DC MongoDB Users' Group, and co-host of the Food Fight Show, a podcast about Chef and DevOps. Like many others who blog, Nathen updates his blog on a very irregular basis. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and Foursquare.

Coding
W3
Introduction to Chef
New!
Nathen Harvey, Opscode

Nathen Harvey is a Technical Community Manager at Opscode, the company behind Chef. Nathen is the co-organizer of DevOps DC and the Washington DC MongoDB Users' Group, and co-host of the Food Fight Show, a podcast about Chef and DevOps. Like many others who blog, Nathen updates his blog on a very irregular basis. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and Foursquare.

9:00 am5:00 pm

Washington 5 Room

This full-day tutorial will provide the attendee with a hands-on introduction to the Chef configuration management system for performing common automation tasks. Each exercise will be instructor-led, and introduce new Chef concepts along the way. We'll cover the Anatomy of a Chef Run, Chef's Authentication Cycle, how to build roles, manipulate configuration through data in attributes, use Chef's search API for dynamic configuration, and more.

Who should attend:

Sysadmins interested in using Chef to meet their configuration management needs.

Take back to work:

Hands-on experience configuring Chef and writing Chef cookbooks. The mixture of tutorial and hands-on teaching in this course gives attendees real exposure to Chef concepts, software, and configuration. The material in this course is intended to bootstrap the attendees' knowledge and provide them with the ability to immediately work with Chef outside of the course.

Topics include:
  • Set up a local workstation with Chef and connect to a Chef server
  • Write your first Chef cookbook
  • Use Chef to automate installation of a Nagios server as a real world example
  • Automate some common system tasks with Chef
Additional Materials:
Thursday, November 7, 2013

Full Day

Dave Beazley, Dabeaz LLC

David Beazley is the author of the Python Essential Reference and has been an active member of the Python community since 1996. He is most widely known for creating several Python-related open-source packages, including SWIG (a tool for building C/C++ extension modules to Python) and PLY (a Python version of the lex/yacc parsing tools). In the 1990s, while working at Los Alamos National Laboratory, he helped pioneer the use of Python on massively parallel supercomputers. From 1998 to 2005, he was an assistant professor at the University of Chicago, where he taught courses on operating systems, networks, and compilers. Dave is currently the owner of Dabeaz LLC, a company specializing in Python software development and training courses.

Coding
R1
The Python Programming Language
Dave Beazley, Dabeaz LLC

David Beazley is the author of the Python Essential Reference and has been an active member of the Python community since 1996. He is most widely known for creating several Python-related open-source packages, including SWIG (a tool for building C/C++ extension modules to Python) and PLY (a Python version of the lex/yacc parsing tools). In the 1990s, while working at Los Alamos National Laboratory, he helped pioneer the use of Python on massively parallel supercomputers. From 1998 to 2005, he was an assistant professor at the University of Chicago, where he taught courses on operating systems, networks, and compilers. Dave is currently the owner of Dabeaz LLC, a company specializing in Python software development and training courses.

9:00 am5:00 pm

Madison AB Room

Python is a dynamic programming language that is often described as a "scripting language" along with languages such as Perl, Tcl, and Ruby. Although Python is often used for scripting, it is actually a full-featured, general-purpose programming language that supports a wide variety of imperative, functional, and object-oriented programming idioms. It also includes a large standard library that provides support for operating system interfaces, networking, threads, regular expressions, XML, GUIs, and more.

In this class we'll take a comprehensive tour of the Python programming language and see how it can be used to solve a variety of practical problems. The class will illustrate important concepts through examples that primarily focus on data analysis, systems programming, and system administration.

Who should attend:

Programmers who want to know what the Python programming language is all about and how it can be applied to a variety of practical problems in data analysis, systems administration, systems programming, and networking. Although no prior Python knowledge is required, attendees should already be experienced programmers in at least one other programming language such as C, C++, Java, or Perl. If you already know some Python, this tutorial will improve your skills.

Take back to work:

A better understanding of what makes Python tick and an increased awareness of how it can be successfully applied to real-world problems.

Topics include:

The Python Language

  • Basic syntax
  • Core datatypes
  • Control flow and exception handling
  • Functions
  • Generators
  • Coroutines
  • Modules
  • Classes and the Python object model
  • Decorators
  • C extensions

Major library modules

  • Text processing
  • Operating system interfaces
  • Network programming
  • Internet programming

Practical Programming Examples

  • Text parsing
  • Data analysis and manipulation
  • Processing log files
  • Handling real-time data streams
  • Controlling and interacting with subprocesses
  • Interacting with web services
  • Simple network programming
  • Internet data handling
Additional Materials:

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

Linux
R2
Root Cause Analysis—Intermediate
New!
Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

9:00 am5:00 pm

Hoover Room

This version of the class is aimed at the senior sysadmin. You have a decade or more experience in the industry, you are T-shaped (specialize in one or two areas but have expertise across a range of technologies), and you have accumulated numerous technical skills; now you want to deepen your meta-expertise. We will create the fog of war and then you’ll practice applying a methodology to focus your attention, working with your team to divvy up tasks, escalate key insights to each other, integrate clues from a range of sources, and produce reports for business leadership. In this version of the class, we spend more time in small groups and more time practicing communication skills than we do in the beginner version. In addition to the technical contributors, each team will need a problem manager—perhaps an unusually broad engineer, perhaps a resource or project manager comfortable with coordinating teams of techs.

Troubleshooting is hard. In hindsight, the answer to a problem is often obvious, but in the chaos and confusion of the moment—with too much data flowing in, time pressure, misleading clues—slicing through the distractions and focusing on the key elements is tough. This is a hands-on seminar: you will work through case studies taken from real-world situations. We divide into groups of 5-7, review a simplified version of Advance7′s Rapid Problem Resolution (RPR) methodology, and then oscillate on a half-hour cycle between coming together as a class and splitting into groups. During class time, I will describe the scenario, explain the current RPR step, and offer to role-play key actors. During group time, I will walk around, coaching and answering questions

 The course material includes log extracts, packet traces, strace output, network diagrams, Cacti snapshots, and vendor tech support responses, all taken from actual RCA efforts. Preview the deck to get a feel for how your day will look. BYOL (Bring Your Own Laptop) for some hands-on, interactive, team-oriented, real-world puzzle solving.

Who should attend:

Sysadmins and network engineers involved in trouble-shooting multidisciplinary problems; problem managers and problem analysts wanting experience coordinating teams.

Take back to work:

Practice in employing a structured approach to analyzing problems which span multiple technology spaces.

Topics include:

Case studies:

  • HPC Cluster Woes: Intermittently, interactive performance on a high-performance computing cluster grinds to a halt, nodes hang, jobs vanish from the queue…
  • Storage Stumbles: Most of the company relies on an 800TB wide-striped storage system, with a multi-protocol (SMB, NFS, iSCSI) front-end from one manufacturer plugged into a Fibre-Channel attached back-end from another manufacturer. Intermittently, the back-end fries a disk, IO latency spikes, clients crash…
Additional Materials:

Lee Damon, University of Washington

Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in LISA (formerly SAGE) (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, chaired CasITconf '11 and '13, and is co-chairing CasITconf '14.

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Linux
R3
Seven Habits of the Highly Effective System Administrator: Hints, Tricks, Techniques, and Tools of the Trade
Lee Damon, University of Washington

Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in LISA (formerly SAGE) (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, chaired CasITconf '11 and '13, and is co-chairing CasITconf '14.

 
Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am5:00 pm

Washington 4 Room

We aim to accelerate the experience curve for junior system administrators by teaching them the tricks (and effective coping strategies) that experienced administrators take for granted and which are necessary for successful growth of both the administrator and the site.

The class covers many of the best practices that senior administrators have long incorporated in their work. We will touch on tools you should use, as well as tools you should try to avoid. We will touch on things that come up frequently, as well as those which happen only once or twice a year. We will look at a basic security approach.

Who should attend:

Junior system administrators with anywhere from little to 3+ years of experience in computer system administration. We will focus on enabling the junior system administrator to "do it right the first time." Some topics will use UNIX-specific tools as examples, but the class is applicable to any sysadmin and any OS. Most of the material covered is "the other 90%" of system administration—things every sysadmin needs to do and to know, but which aren't details of specific technical implementation.

Take back to work:

Ideas about how to improve and to streamline your systems and your workload, and, just as important, where to look to find more answers.

Topics include:
  • The five things every site should know
  • Why your computers should all agree on what time it is
  • Why root passwords should not be the same on every computer
  • Why backing up every file system on every computer is not always a good idea
  • Policies—where you want them and where you might want to avoid them
  • Ethical issues
  • Growth and success as a solo-sysadmin as well as in small, medium, and large teams
  • Training, mentoring, and personal growth planning
  • Site planning and roadmaps
  • Logistics
  • Books that can help you and your users
Additional Materials:
Friday, November 8, 2013

Full Day

James Sweeny, Puppet Labs

James Sweeny is a Professional Services Engineer for Puppet Labs. He spends most of his time flying around the world helping companies solve their systems management problems, and ranting on endless tangents about the wonders of Puppet and open source software and he one day hopes to stop receiving phantom Nagios alerts from his past life as a sysadmin. Though he calls New York City home, he is rarely there, but is happy anywhere with good food and good beer.

Coding
F1
Introduction to Puppet Enterprise
New!
James Sweeny, Puppet Labs

James Sweeny is a Professional Services Engineer for Puppet Labs. He spends most of his time flying around the world helping companies solve their systems management problems, and ranting on endless tangents about the wonders of Puppet and open source software and he one day hopes to stop receiving phantom Nagios alerts from his past life as a sysadmin. Though he calls New York City home, he is rarely there, but is happy anywhere with good food and good beer.

9:00 am5:00 pm

Washington 4 Room

This training course is intended as a one-day introduction for technical audiences who want to understand and practice the main concepts of the Puppet Enterprise, Puppet Labs' IT automation software.

Who should attend:

System administrators, IT managers, and any students who are new to Puppet Enterprise and need a better understanding of the concepts of Puppet combined with hands-on experience with basic Puppet coding and implementation.

Take back to work:

The ability to discuss the benefits of Puppet Enterprise and to practice basic system administrator competencies in using the IT automation software to manage their infrastructure effectively.

Topics include:
  • About Puppet technology
  • Why Puppet?
  • Modules and classes
  • Puppet Agent and Puppet Master
  • Additional Puppet concepts
  • Puppet Forge
  • Puppet Enterprise
  • Puppet Labs technical curriculum and certification
Additional Materials:

David Rhoades, Maven Security Consulting Inc.

David Rhoades is a senior consultant with Maven Security Consulting Inc. (www.mavensecurity.com). Maven Security Consulting Inc. is a Delaware corporation that provides information security assessments and training to a global clientele. David's expertise includes Web application security, network security architectures, and vulnerability assessments. Past customers have included domestic and international companies in various industries, as well as various US government agencies. David has been active in information security consulting since 1996, when he began his career with the computer security and telephony fraud group at Bell Communications Research (Bellcore). David has a Bachelor of Science degree in Computer Engineering from the Pennsylvania State University.

Steve Pinkham, Maven Security Consulting Inc.

Steve Pinkham is a security researcher for Maven Security Consulting Inc. (www.mavensecurity.com). Steve has spent time in systems administration, programming, security research, and consulting. He dabbles in electronics and philosophy, and enjoys cheese and fine tea. If you're unlucky you might run into him wearing a backpack in some remote corner of the world. He holds a Bachelor of Science in Computer Science from Virginia Commonwealth University.

Linux
F2
Hands-on Introduction to Common Hacking Tools
New!
David Rhoades, Maven Security Consulting Inc.

David Rhoades is a senior consultant with Maven Security Consulting Inc. (www.mavensecurity.com). Maven Security Consulting Inc. is a Delaware corporation that provides information security assessments and training to a global clientele. David's expertise includes Web application security, network security architectures, and vulnerability assessments. Past customers have included domestic and international companies in various industries, as well as various US government agencies. David has been active in information security consulting since 1996, when he began his career with the computer security and telephony fraud group at Bell Communications Research (Bellcore). David has a Bachelor of Science degree in Computer Engineering from the Pennsylvania State University.

 
Steve Pinkham, Maven Security Consulting Inc.

Steve Pinkham is a security researcher for Maven Security Consulting Inc. (www.mavensecurity.com). Steve has spent time in systems administration, programming, security research, and consulting. He dabbles in electronics and philosophy, and enjoys cheese and fine tea. If you're unlucky you might run into him wearing a backpack in some remote corner of the world. He holds a Bachelor of Science in Computer Science from Virginia Commonwealth University.

9:00 am5:00 pm

Lincoln 3 Room

As an increasing number of valuable corporate assets are made available over computer networks, having the ability to understand the types of threats facing your organizations as well as possessing the skills required to identify and mitigate flaws on your networks are increasingly important components of an organizations' risk posture.

This full-day course will enable IT professionals to better understand the methodologies, tools, and techniques used by attackers against their technical infrastructure, with the primary aim of helping them develop better defense methods and capabilities. Attendees will learn to perform basic technical security vulnerability assessment tasks and gain a strong foundation for future studies in host and network security assessment.

The workshop will use Kali Linux™, a collection of free and open source security tools many attackers and security practitioners use. Students will have access to a network of targets that will allow them to get hands-on experience, enhancing understanding of the process and tools, and how to effectively counter them. Guidance will be provided for continued learning after the course if students wish to go on to master the tools and techniques introduced in this course.

Who should attend:

Auditors who want to understand better the methodologies, tools, and techniques used by attackers against their network and who need help developing better policy.

Take back to work:

The ability to perform basic assessment tasks and a strong foundation for future studies in host and network security assessment.

Topics include:
  • Discussion of current and emerging attacker methods, techniques, concepts, and tools
  • Setting up and using Kali Linux, a collection of security tools (Kali is the successor to Backtrack)
  • Understanding the basic steps an attacker uses to penetrate a network
  • Discussing and using tools for network mapping and analysis
  • Configuring and using the OpenVAS vulnerability scanner to audit network and host security
  • Setting up and using Metasploit exploit framework to exploit found flaws
  • Understanding password cracking, enabling you to craft better authentication capabilities and audit password strength
  • An overview of building client-side exploits and basic anti-virus evasion techniques
  • Understanding the common Web flaws of SQL injection and Cross Site Scripting (XSS), and demonstrating their impact on various applications and solutions
  • References to remediate or implement compensating controls
Additional Materials:

Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

Soft Skills
F3
Combined MGMT-101/MGMT-201: Effective Communication and Team Leadership for Sysadmins
New!
Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

9:00 am5:00 pm

Lincoln 2 Room

As a technical expert, you have learned how to wield the keyboard to great effect to solve technical problems. But you have also learned that this is not enough. To be effective, you may need the support of others to get funding, approve, assist, contribute to, or support your work. People can make computers seem simple.

This tutorial is about becoming a more effective system administrator through improved communication skills. In part one, we will help you understand how communication works, so that it becomes a tool you can use instead of a source of frustration.

As you grow in seniority a funny thing happens: you are expected to pass that wisdom onto others. You are given projects to run, teams to lead, apprentices to mentor, and ever larger budgets to manage effectively. The one thing, however, that you almost never receive is management training. In part two, we will turn our attention to team management. Management, like any system, can be learned. There are tools, techniques and tips that you can call on to be effective in your "organization-facing" duties.

This is an immersion course.

Who should attend:

System administrators who wish to learn tools and tips that will assist them to communicate more effectively with their managers, users, and other important constituents; system administrators who have found themselves (or are hoping, or anticipating with apprehension) being given responsibilities for "wetware systems" (i.e., other people).

Take back to work:

A bag full of tools to help you be more effective in your people-facing activities.

Topics include:
  • Oral communication (effective listening, effective talking, presentations)
  • Written communication (progress reporting, technical documentation, writing proposals, buy-vs.-build evaluations, cost-risk evaluations, audit reports)
  • Understanding others (understanding various communities, conflict resolution, personality types)
  • Time management
  • Risk management
  • Project management
Additional Materials:
Monday, February 17, 2014

Half Day Morning

Mark Grover, Cloudera, Inc.

Mark Grover is a committer on Apache Bigtop, a committer and PMC member on Apache Sentry (incubating) and a contributor to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume. He is also a section author of O’Reilly’s Programming Hive. Mr. Grover presently works as a software engineer at Cloudera and frequently presents on Hadoop ecosystem technologies at software conferences.

M1
Introduction to Apache Hadoop and Its Ecosystem
Mark Grover, Cloudera, Inc.

Mark Grover is a committer on Apache Bigtop, a committer and PMC member on Apache Sentry (incubating) and a contributor to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume. He is also a section author of O’Reilly’s Programming Hive. Mr. Grover presently works as a software engineer at Cloudera and frequently presents on Hadoop ecosystem technologies at software conferences.

9:00 am12:30 pm

Ballroom A

Originally inspired by Google's GFS and MapReduce papers, Apache Hadoop is an open source framework offering scalable, distributed, fault-tolerant data storage and processing on standard hardware. This session explains what Hadoop is and where it best fits into the modern data center. You'll learn the basics of how it offers scalable data storage and processing, some important "ecosystem" tools that complement Hadoop's capabilities, and several practical ways organizations are using these tools today. Additionally, you'll learn about the basic architecture of a Hadoop cluster and some recent developments that will further improve Hadoop's scalability and performance.

Who should attend:

This session is intended for those who are new to Hadoop and are seeking to understand what Hadoop is, the ways that organizations are using it, and how it compares to and integrates with other systems. It assumes no prior knowledge of Hadoop, and explanations of technical topics like MapReduce and HDFS replication are clear and concise, making it appropriate for anyone attending the conference.

Topics include:
  • What Hadoop is and how organizations are using it
  • How the HDFS filesystem provides reliability and high throughput
  • How MapReduce enables parallel processing on large data sets
  • Explanations of some popular open source tools that integrate with Hadoop
  • Typical architecture of a Hadoop cluster
  • Considerations for hosting a Hadoop cluster
  • Emerging trends in the design and implementation of Hadoop
Additional Materials:

W. David Schwaderer, Silicon Valley Consultant

W. David Schwaderer has a Masters Degree in Applied Mathematics from the California Institute of Technology and a MBA from the University of Southern California. He is the former Editor-in-Chief of the VERITAS Architect Network and the Symantec Technology Network. He presently consults for Samsung Semiconductor, Inc.’s Silicon Valley Systems Architecture Lab where he assists world-class engineers develop Flash memory storage innovations that will intercept your family's future.

In all his writings and training seminars, David applies Einstein's (disputed) observation that "Everything should be made as simple as possible, but not simpler." As a multidisciplinary technologist, he has authored technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. He has seven books on innovation planned following 15 years of intense research on the topic. His 12th, and possibly last, technical book, co-authored with Jason Resch, is titled "Exabyte Data Preservation, Postponing the Inevitable."

David has presented at IEEE conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across Silicon Valley. His four innovation Google TechTalks on YouTube have recorded nearly 39,000 views. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage for Video-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

On a good day, Google Web searches for "W. David Schwaderer" indicate about 1.5 million hits. But on a bad day, it's only around 900,000.

M2
Erasure Code Principles Made So Simple, You'll Really Like Them
W. David Schwaderer, Silicon Valley Consultant

W. David Schwaderer has a Masters Degree in Applied Mathematics from the California Institute of Technology and a MBA from the University of Southern California. He is the former Editor-in-Chief of the VERITAS Architect Network and the Symantec Technology Network. He presently consults for Samsung Semiconductor, Inc.’s Silicon Valley Systems Architecture Lab where he assists world-class engineers develop Flash memory storage innovations that will intercept your family's future.

In all his writings and training seminars, David applies Einstein's (disputed) observation that "Everything should be made as simple as possible, but not simpler." As a multidisciplinary technologist, he has authored technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. He has seven books on innovation planned following 15 years of intense research on the topic. His 12th, and possibly last, technical book, co-authored with Jason Resch, is titled "Exabyte Data Preservation, Postponing the Inevitable."

David has presented at IEEE conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across Silicon Valley. His four innovation Google TechTalks on YouTube have recorded nearly 39,000 views. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage for Video-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

On a good day, Google Web searches for "W. David Schwaderer" indicate about 1.5 million hits. But on a bad day, it's only around 900,000.

9:00 am12:30 pm

Ballroom B

Erasure Code storage applications (RAID 6, Object Storage, Information Dispersal, etc.) are all the rage, and deservedly so. They have intrinsic, engineering beauty and elegance that merit front-row seats in deep, advanced-technology discussions. But mastering Erasure Code principles can quickly prove challenging, if not impossible, because Erasure Coding's simple principles are typically steeped in academic obfuscation. This has historically presented impenetrable obstacles to uncounted intrepid, serious, and competent engineers—maybe even you. Luckily, that's totally unnecessary.

This presentation's goal is to arm aspiring, inquisitive engineers with Erasure Code foundational insights, intuition, and fundamental understandings that enable them to totally dominate Erasure Code discussions, both on their home courts and on their own terms.

Make no mistake: this session intends to be fun, but technically informative at a deep, visceral level. There will even be a Python programming demonstration, time allowing. Erasure Code principles likely will never be made more accessible than what you experience here. This is the Erasure Code train to catch; don't be left behind.

Topics include:
  • Numbers, Counting Ducks, Clubs, and Special Club Members Such as 0 and 5
  • Elementary School Arithmetic—Addition and Multiplication
  • Powers and Inverse Powers—2x2x2x2 = 2^4 
  • Solving High School Equations—Determining Apple and Banana Prices
  • The Parallel Universes Around Us—Star Trek Stuff or Just GF(N)s?
Additional Materials:

Half Day Afternoon

Jeff Darcy, Red Hat

Jeff Darcy (S3) has worked on network and distributed storage systems for 20 years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC. He is currently a member of the GlusterFS architecture team at Red Hat and frequently gives talks and tutorials about topics related to cloud storage.

M3
Building a Cloud Storage System
Jeff Darcy, Red Hat

Jeff Darcy (S3) has worked on network and distributed storage systems for 20 years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC. He is currently a member of the GlusterFS architecture team at Red Hat and frequently gives talks and tutorials about topics related to cloud storage.

1:30 pm5:00 pm

Ballroom A

Cloud storage has become an important part of both the way that modern compute clouds are built and the service that they provide for users. This tutorial will explain what cloud storage systems have in common and what makes each one different, enabling attendees to select or build the right system for their specific needs.

Who should attend:

Primarily, people who wish to implement their own task-specific cloud storage systems. Secondarily, those who wish to understand the tradeoffs implicit in existing cloud storage systems.

Topics include:
  • Types of cloud storage: service for a cloud provider, service for cloud users, or service for consumers
  • Tradeoffs between consistency, performance, and availability
  • Special requirements: security and privacy, legal and regulatory compliance
  • Common techniques: membership and leader election, consistent hashing, vector clocks, Merkle trees, Bloom filters
  • Case studies: existing systems representing different tradeoffs and techniques
Additional Materials:

Brent Welch, Google

Brent Welch is a senior software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

M4
Clustered and Parallel Storage System Technologies
Brent Welch, Google

Brent Welch is a senior software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

1:30 pm5:00 pm

Ballroom B

This tutorial is oriented toward administrators and developers who manage and use HPC systems, and especially for those involved with storage systems in these environments. Storage is often a critical part of the HPC infrastructure. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.  

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices and SSDs, in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, and object-based parallel file system architectures.

Topics include scaling the data path, scaling metadata, fault tolerance, and manageability. Specific systems are discussed, including Lustre, GPFS, PanFS, HDFS (Hadoop File System), OpenStack, and the NFSv4.1 standard for parallel I/O. We continue up the stack to discuss MPI-IO middleware that is often used in large parallel programming environments for efficient I/O at scale.

Topics include:
  • Scaling the data path
  • Scaling metadata
  • Fault tolerance
  • Manageability

Specific systems are discussed, including Lustre, GPFS, PanFS, HDFS (Hadoop File System), OpenStack, and the NFSv4.1 standard for parallel I/O. We continue up the stack to discuss MPI-IO middleware that is often used in large parallel programming environments for efficient I/O at scale.

Additional Materials:
Tuesday, June 17, 2014

Half Day Morning

Daniel A. Menascé, George Mason University

Daniel Menascé is the author of over 225 papers and five books published by Prentice Hall and translated into Russian, Korean, and Portuguese. He has received two Outstanding Teaching awards from George Mason University and was a finalist in the 2014 Outstanding Faculty competition among all public and private higher education institutions in the state of Virginia. Menascé has given keynote addresses and presented tutorials at various conferences.

T1
Autonomic Computing and Its Applications
Daniel A. Menascé, George Mason University

Daniel Menascé is the author of over 225 papers and five books published by Prentice Hall and translated into Russian, Korean, and Portuguese. He has received two Outstanding Teaching awards from George Mason University and was a finalist in the 2014 Outstanding Faculty competition among all public and private higher education institutions in the state of Virginia. Menascé has given keynote addresses and presented tutorials at various conferences.

9:00 am12:30 pm

Washington AB

This tutorial provides an overview of AC and the various technologies that have been used to design and implement AC systems. Examples will be given in a variety of areas. The tutorial follows this outline:

  1. AC Overview (15 min)
  2. Techniques used: model-driven, learning-based, control-theory (45 min)
  3. Applications of AC (1 hour and 45 minutes):
    • Cloud computing and data centers
    • Adaptive software systems
    • E-commerce and Web systems
    • SOA systems
    • Databases
    • Emergency departments
  4. Concluding Remarks (15 min)
Who should attend:

Researchers and Autonomic Computing (AC) practitioners. No prerequisites required.

Additional Materials:

Half Day Afternoon

Iqbal I. Mohomed, IBM T.J. Watson Research Center

Dr. Iqbal Mohomed is a Research Staff Member at IBM's T.J. Watson Research Center in NY. His research interests are in distributed systems, cloud and mobile computing. Most recently, he is working on workload orchestration in cloud environments.

Dr. Mohomed earned his Ph.D. from the University of Toronto in 2008. His dissertation work was on automatic customization of web content for mobile devices. As a postdoc at Microsoft Research Silicon Valley, he worked on several middleware systems for mobile devices including the StarTrack middleware for efficient organization of user location data and the Contrail system for enabling secure communication across mobile users (awarded Best Paper at Middleware 2011). At IBM Research, Dr. Mohomed has worked on various projects including the use of personal mobile devices to enable efficient long-term health monitoring (HARMONI), a cloud-based monitoring system for virtual machines (Cloudscope) and a distributed system for optimizing placement of resources in a private cloud computing system.

Asser N. Tantawi, IBM T.J. Watson Research Center

Dr. Asser N. Tantawi is a Research Staff Member at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY. He received his Ph.D. degree in computer science from Rutgers University in 1982. Dr. Tantawi has published numerous articles in scientific journals and international conferences. His fields of interest include cloud optimization, workload management, analytics, systems modeling, stochastic analysis, model-based control, load balancing, resource optimization, queueing models. He is a senior member of IEEE and a member of ACM and IFIP WG 7.3 (Computer System Modeling). He has also served as an ACM national lecturer.

T2
Autonomic Cloud Workload Optimization: Placement in OpenStack
Iqbal I. Mohomed, IBM T.J. Watson Research Center

Dr. Iqbal Mohomed is a Research Staff Member at IBM's T.J. Watson Research Center in NY. His research interests are in distributed systems, cloud and mobile computing. Most recently, he is working on workload orchestration in cloud environments.

Dr. Mohomed earned his Ph.D. from the University of Toronto in 2008. His dissertation work was on automatic customization of web content for mobile devices. As a postdoc at Microsoft Research Silicon Valley, he worked on several middleware systems for mobile devices including the StarTrack middleware for efficient organization of user location data and the Contrail system for enabling secure communication across mobile users (awarded Best Paper at Middleware 2011). At IBM Research, Dr. Mohomed has worked on various projects including the use of personal mobile devices to enable efficient long-term health monitoring (HARMONI), a cloud-based monitoring system for virtual machines (Cloudscope) and a distributed system for optimizing placement of resources in a private cloud computing system.

 
Asser N. Tantawi, IBM T.J. Watson Research Center

Dr. Asser N. Tantawi is a Research Staff Member at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY. He received his Ph.D. degree in computer science from Rutgers University in 1982. Dr. Tantawi has published numerous articles in scientific journals and international conferences. His fields of interest include cloud optimization, workload management, analytics, systems modeling, stochastic analysis, model-based control, load balancing, resource optimization, queueing models. He is a senior member of IEEE and a member of ACM and IFIP WG 7.3 (Computer System Modeling). He has also served as an ACM national lecturer.

1:30 pm5:00 pm

Washington AB

This tutorial opens the door for the ICAC audience to apply some of the autonomic computing ideas to the optimized deployment of workloads in the cloud. We have designed the tutorial to have two parts: (I) Overview of cloud management, OpenStack, Heat, and HOT technologies; and (II) Optimization algorithms for solving the large-scale placement problem of workloads in the cloud, in a scaleable manner. Part I acts as an introduction to the area for those who may be experts in autonomic computing, but are not quite familiar with the state-of-the-art of cloud management. And, part II should appeal to the theoreticians and application-oriented in the audience alike.

  • Overview of cloud management (1.5 hrs):
    • Overview of OpenStack open source cloud software
    • Heat template-driven orchestration engine
    • HOT: The Heat orchestration template
    • Cloud workload definition
    • Architecture of a workload placement engine
    • End-to-end flow
  • Workload Optimization (1.5 hrs)
    • Definition of workload placement optimization problem
    • Problem complexity and scalability
    • Algorithmic approaches to placement optimization
    • Examples and case studies
Additional Materials:
Thursday, June 19, 2014

Half Day Morning

David Nalley, Citrix Systems

David Nalley is a recovering systems administrator of 10 years and acted as a consultant for a an additional 3 years. David is a committer on the Apache CloudStack project. Within the Fedora Project, David is the maintainer of a few dozen software packages, and is a sponsor for new packagers in Fedora, and currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and is frequently found speaking at IT and F/LOSS conferences.

R2
Using Apache CloudStack to Build a Big IaaS Cloud
David Nalley, Citrix Systems

David Nalley is a recovering systems administrator of 10 years and acted as a consultant for a an additional 3 years. David is a committer on the Apache CloudStack project. Within the Fedora Project, David is the maintainer of a few dozen software packages, and is a sponsor for new packagers in Fedora, and currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and is frequently found speaking at IT and F/LOSS conferences.

9:00 am12:30 pm

Riverview A

IaaS provides infrastructure as a service, allowing your customers to spin up VMs, services, and even machine images on demand. In this class, you will learn about building IaaS clouds using Apache CloudStack. We'll cover everything from initial configuration to scaling and maintaining availability. Apache CloudStack works with KVM, vSphere, and XenServer/XCP for virtualization, and also supports Amazon Web Services.

Who should attend:

Intermediate to advanced sysadmins or enterprise architects wanting to deploy a production Infrastructure as a Service cloud.

Prerequisites: Experience with virtualization platforms and deep understanding of layer two and three (L2/L3) networking is preferred but not required.

Take back to work:

What you need to deploy an IaaS cloud based on Apache CloudStack in an enterprise environment.

Topics include:
  • Deploying Apache CloudStack 
  • Next-generation network topologies, including SDN 
  • Scaling storage without becoming indentured to SAN vendors
  • Making CloudStack scale to tens of thousands of physical nodes 
  • Maintaining availability in a "failure-is-assured" environment
Additional Materials:

Half Day Afternoon

Joshua Jensen, EMC

Joshua Jensen has been working with Linux for 18 years and is currently a Linux/Virtualization/Cloud Architect for EMC. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

R3
Jenkins for Continuous Integration
Joshua Jensen, EMC

Joshua Jensen has been working with Linux for 18 years and is currently a Linux/Virtualization/Cloud Architect for EMC. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

1:30 pm5:00 pm

Riverview A

Your software development project needs a butler. Why? Your development team may be guilty of writing great code while only sporadically testing it, and pushing your code-base to production is tedious at best. Enter Continuous Integration with Jenkins. Learn how this development methodology and open source project can improve your software development process by consistently and continuously automating your software projects’ integration, testing, and deployment. I will walk you through a scenario of setting up a project using Jenkins. This course will conclude with a hands-on lab incorporating a GitHub project into an automated Jenkins Build Pipeline.

Who should attend:

DevOps administrators in need of sanity when testing and deploying software

Lab requirements: Each participant in this class will need a laptop with two virtual machines preinstalled with any Jenkins-compatible Linux distribution.

Take back to work:

Real world usage of Jenkins for better automation of testing and deployment with your team's software project.

Topics include:
  • Continuous integration concepts and best practices 
  • Jenkins concepts and configuration 
  • Jobs, job history, and build artifacts 
  • SCM integration and job triggering 
  • Jenkins cluster farming: multi-node management made easy 
  • Authentication with Active Directory, LDAP, and role-based authorization 
  • Jenkins plugins and third-party integration 
  • Distributed builds for heterogeneous architectures 
  • Build/test/deploy pipelines 
  • Hands-on lab incorporating a GitHub project into an automated Build Pipeline
Additional Materials:
Friday, June 20, 2014

Full Day

Salim Virji, Google

Salim Virji has been developing software and administering systems for 20 years. After working as an applications software engineer at several startups, he found his interest drawn to distributed file systems and cluster computing while pursuing a hobby project with friends.

At Google, he has worked on developing and deploying the company's core software building blocks. Since 2003, he has held the the position of Site Reliability Engineer, responsible for technical infrastructure such as Bigtable, Chubby, and GFS; more recently, he has joined the team developing software infrastructure for Google Apps such as Docs and Presentations. He also collaborates with Google's in-house training programs for engineers, developing curriculum and presenting hands-on workshops to introduce engineers to Google technology and techniques.

He received an A.B. in Classics from the University of Chicago.

F1
SRE University—Practical Large System Design
Salim Virji, Google

Salim Virji has been developing software and administering systems for 20 years. After working as an applications software engineer at several startups, he found his interest drawn to distributed file systems and cluster computing while pursuing a hobby project with friends.

At Google, he has worked on developing and deploying the company's core software building blocks. Since 2003, he has held the the position of Site Reliability Engineer, responsible for technical infrastructure such as Bigtable, Chubby, and GFS; more recently, he has joined the team developing software infrastructure for Google Apps such as Docs and Presentations. He also collaborates with Google's in-house training programs for engineers, developing curriculum and presenting hands-on workshops to introduce engineers to Google technology and techniques.

He received an A.B. in Classics from the University of Chicago.

9:00 am5:00 pm

Riverview A

In this class, you will learn about large system design. Truly large-scale systems are still rare, and in a world of outsourcing and cloud computing, it's harder for system administrators to get the opportunity to design large systems. It's even harder to get the design right. Most organizations don't have the in-house expertise to build a large system, and outsource the detailed design to external contractors. If your organization doesn't have the expertise to design a large system, it's unlikely that it has the expertise to confirm that a proposal is fit for purpose and cost effective.

While anyone can wave their hands convincingly and come up with a rough outline of a large distributed system, those who can also fill in the details are highly prized. This class will teach you how to design software systems like Imgur and Twitter, then estimate the hardware needed to ensure you can deliver to an SLA. You will learn how requirements like queries-per-second, multi-site reliability, and data security impact the cost of implementation. You will be involved in classroom exercises in small groups, each with its own Google SRE mentor, while working out these designs.

Who should attend:

System administrators, SREs, and DevOps who have some familiarity of distributed systems, server hardware, and systems programming, especially those who would like to work with, procure, or build large distributed systems.

Take back to work:

How to design large distributed systems, how to evaluate design proposals, and how to explain such designs to third parties.

Topics include:
  • Design patterns for large distributed systems 
  • Monitoring large-scale systems 
  • Large-scale design workshop and presentations 
  • Non-abstract design; taking a design and producing a "bill of materials" 
  • Designing to fail; how to work around rack, networking, and datacenter failures
Additional Materials:
Sunday, November 9, 2014

Full Day

Chris Jones, Google, Inc.

A computational daemonologist, Chris works in San Francisco as a SiteReliability Engineer for Google App Engine, a platform serving over 28 billion requests per day. He was previously responsible for the care and feeding of advertising statistics, data warehousing, and customer support systems, joining Google in 2007. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in computer engineering, economics, and technology policy along the way.

S1
SRE Classroom: Practical Large Scale Design
Chris Jones, Google, Inc.

A computational daemonologist, Chris works in San Francisco as a SiteReliability Engineer for Google App Engine, a platform serving over 28 billion requests per day. He was previously responsible for the care and feeding of advertising statistics, data warehousing, and customer support systems, joining Google in 2007. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in computer engineering, economics, and technology policy along the way.

9:00 am5:00 pm

Grand Ballroom C

Truly large-scale systems are still rare, and in a world of out-sourcing, and cloud-computing, it's harder for System Administrators to get the opportunity to design large systems. It's even harder to get the design right. Most organizations don't have the in-house expertise to build a large system, so outsource the detailed design to external contractors. If your organization doesn't have the expertise to design a large system, it's unlikely that it has the expertise to confirm a proposal is fit for your purposes and is cost effective.

While anyone can wave their hands convincingly, and come up with a rough outline of a large distributed system, those who can also fill in the detail are highly-prized. This class will teach you how to design software systems like Imgur and Twitter, then estimate the hardware needed to ensure you can deliver to a hard SLA. You will learn how requirements like queries-per-second, multi-site reliability, and data security impact the cost of implementation.

This will *not* cover concurrent programming and the software-engineering side of distributed systems.

Who should attend:

System Admins who want to understand large systems, System Engineers who want to build large systems and business people who expect to procure large systems.

Take back to work:

An ability to assess whether a design will work before it's built. The ability to communicate large system design with systems architects and business people. An ability to assess requirements, communicate constraints and predict performance of large systems before money is spent.

Topics include:
  • How design works
  • Finding bottlenecks in systems
  • Large scale design patterns
  • Lab work: gathering requirements in teams
  • Lab work: estimating hardware requirements
  • Lab work: building in reliability
Additional Materials:

Jennifer Davis, Chef

Jennifer has years of experience improving platform development efficiency through building reliable large scale services, reducing complexity of product implementation, and automation of infrastructure. As a Chef Automation Consulting Engineer, she helps companies discover their own best practices. Prior to joining Chef, she was a lead service engineer at Yahoo! within cloud platform services supporting the multi-tenant Hadoop environment and Sherpa. She is also a sparkly devops princess that is passionate about increasing diversity in tech. She doesn’t play favorites for programming languages or editors, but her favorite board game is Waterdeep.

S3
Hadoop Operations
Jennifer Davis, Chef

Jennifer has years of experience improving platform development efficiency through building reliable large scale services, reducing complexity of product implementation, and automation of infrastructure. As a Chef Automation Consulting Engineer, she helps companies discover their own best practices. Prior to joining Chef, she was a lead service engineer at Yahoo! within cloud platform services supporting the multi-tenant Hadoop environment and Sherpa. She is also a sparkly devops princess that is passionate about increasing diversity in tech. She doesn’t play favorites for programming languages or editors, but her favorite board game is Waterdeep.

9:00 am5:00 pm

Grand Ballroom B

If you are interested in running Hadoop and need to understand how to manage a cluster and not how to write Java applications, this course is for you. If you inherited a non-production environment that now has mission critical data and you need to harden that Hadoop environment for production, then this course is also for you. In this class, I will teach you the core set of skills needed to deploy, configure, and monitor resources with an emphasis on providing you with the basics so you can use self-directed learning to go deeper. While there will be some exercises to show you how to use Hadoop, the focus of this tutorial and its hands-on exercises is operation.

Who should attend:

Unix system administrators interested in an introduction to hadoop operations.

Take back to work:
  • Techniques for deploying, configuring, monitoring, and securing a hadoop cluster
  • Essential Hadoop troubleshooting skills
  • Working Hadoop environment on laptop
Topics include:
  • Overview of Hadoop and HDFS
  • Sandbox setup for class exercises
  • Architecture
  • Installation
  • Configuration
  • Logging
  • Monitoring
  • Security
Additional Materials:

Dr. Neil J. Gunther, Performance Dynamics Company

Neil Gunther, M.Sc., Ph.D. is a researcher specializing in performance and capacity management. Prior to starting his own consulting company in 1994 (www.perfdynamics.com), Neil worked on the NASA Voyager and Galileo missions, the Xerox PARC Dragon multiprocessor, and the Pyramid/Siemens RM1000 parallel cluster. Neil has authored many technical articles and several books including: Guerrilla Capacity Planning (Springer 2007) and the 2nd edition of Analyzing Computer System Performance with Perl::PDQ (Springer 2011) and received the A.A. Michelson Award in 2008.

S4
Tactical Capacity Planning for Sysadmins
Dr. Neil J. Gunther, Performance Dynamics Company

Neil Gunther, M.Sc., Ph.D. is a researcher specializing in performance and capacity management. Prior to starting his own consulting company in 1994 (www.perfdynamics.com), Neil worked on the NASA Voyager and Galileo missions, the Xerox PARC Dragon multiprocessor, and the Pyramid/Siemens RM1000 parallel cluster. Neil has authored many technical articles and several books including: Guerrilla Capacity Planning (Springer 2007) and the 2nd edition of Analyzing Computer System Performance with Perl::PDQ (Springer 2011) and received the A.A. Michelson Award in 2008.

9:00 am5:00 pm

Grand Ballroom D

Most system administrators are already familiar with diagnosing and tuning computer systems using performance data captured by open source or commercial monitoring tools such as Splunk, Graphite, and BMC Patrol.

This full-day tutorial will show you how to get beyond performance monitoring to performance analysis and capacity planning. These skills are in demand more than ever for sizing (over-engineering can't improve single-threaded performance), procurement (try buying a 10GHz processor), as well as ensuring scalability of large infrastructures used in both private and public clouds.

Since computer hardware has become a mass-produced commodity, its cost no longer drives capacity planning in the strategic sense. The capacity part has become cheap and easy; it's the planning part that requires skill. And capacity planning is not just about the future anymore. Rather, it needs to respond rapidly to the fast-paced changes and tighter budgets of modern business environments. Enter tactical planning: Guerrilla-style capacity planning.

Who should attend:

Anyone looking for job security by improving their skill set to include capacity management. No specialized background in performance analysis or capacity planning is assumed. A working knowledge of Linux or Unix performance tools will be helpful.

Take back to work:

The ability to start analyzing performance data you may already have collected to forecast system capacity and predict bottlenecks that can hinder system scalability.

Topics include:
  • What is performance and capacity management?
  • The Guerrilla approach to capacity planning.
  • Monitoring the volatile technology marketplace for procurement.
  • The three performance metrics you need to know.
  • Who ordered multicores and what are their performance limitations?
  • Statistical forecasting with R.
  • How to establish a capacity line.
  • Queueing analysis for those who can’t wait.
  • How to use PDQ for bottleneck analysis in R, C, Perl, and Python.
  • Quantifying scalability using Amdahl's law and the Universal Scalability Law.
  • Virtualization capacity management from core hyperthreads to cloud hyperservices.
  • Scalability analysis of Xen, VMware and WebLogic virtualized servers.
  • Case studies in capacity planning for large-scale web sites and multi-tier applications.
Additional Materials:

Half Day Morning

Kyrre Begnum, Oslo and Akershus University College of Applied Sciences

Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BS levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

S2
Statistics for Ops: Making Sense Out of Data
Kyrre Begnum, Oslo and Akershus University College of Applied Sciences

Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BS levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

 
Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

9:00 am12:30 pm

Grand Ballroom A

This tutorial is a course in statistics with a specific focus on system administrators and the types of data they face. We assume little prior knowledge of statistics and cover the most common concepts in descriptive statistics and apply them to data taken from real-life examples. Our aim is to provide insight into what methods provide good interpretation of data such as distributions, probability and formulating basic statements about the properties of observed data.

The tutorial instructors will be available in a Lab Space following the tutorial in order to answer questions and offer personal feedback on cases the attendees wish to investigate with their own data.

Who should attend:

Sysadmins who are faced with data overload and wish they had some knowledge of how statistics can be used to make more sense of it. We assume little prior knowledge of statistics, but a basic mathematical proficiency is recommended.

Take back to work:
  • A fundamental understanding of how descriptive statistics can help provide additional insight on the data in the sysadmin world and that will allow for further self-study on statistics. 
  • A basic set of statistical approaches that can be used to identify fundamental properties of the data they see in their own environments, and identify patterns in that data.
  • Learn how to make accurate and clear statements about metrics that are valuable to their organization.
Topics include:
  • Descriptive statistics for single datasets, including: mean, median, mode, range, and distributions
  • Basic analysis of distributions and probabilities using percentiles typically seen in ops
  • Interpretation of analyses to include team and business implications
  • Regression analysis to suggest predictive relationships, with an emphasis on interpretation and implications
  • Correlation analysis and broad pattern detection (if time allows)
Additional Materials:

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

S5
Work Like a Team: Best Practices for Team Coordination and Collaborations So You Aren't Acting Like a Group of Individuals
Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

9:00 am12:30 pm

Cedar Room AB

System Administration is a team sport. How can we better collaborate and work as a team? Techniques will include many uses of Google Docs, wikis and other shared document systems, as well as strategies and methods that create a culture of cooperation.

Who should attend:

System administrators and managers that work on a team of 3 or more.

Take back to work:
  • Understanding the different roles people play within a team.
  • Behavior that builds team cohesion
  • 3 uses of Google docs you had not previously considered
  • How to organize team projects to improve teamwork
  • Track projects using Kanban boards.
  • How to divide big projects among team members
  • Collaborating via the "Tom Sawyer Fence Painting" technique
  • How to criticize the work of teammates constructively
  • How to get agreement on big plans
Topics include:
  • Meetings: How to make them more effective, shorter, and more democratic
  • How to create accountability, stop re-visiting past decisions, improve involvement
  • Strategy for leaving “fire-fighting” mode, be more “project focused”.
  • Project Work: Using “design docs” to get consensus on big and small designs before they are committed to code.
  • Service Docs: How to document services so any team member can cover for any other.
  • Kanban: How to manage work that needs to be done.
  • Chatroom effectiveness: How to make everyone feel included, not lose important decisions.
  • Playbooks: How to get consistent results across the team, train new-hires, make delegation easier.
  • Send more effective email: How to write email that gets read.


(NOTE: This class is a reboot of last year’s “Advanced Time Management: Team Efficiency”)

Additional Materials:

Half Day Afternoon

Jason Maughan, PurePredictive, Inc.

Jason Maughan is a predictive data analyst at PurePredictive, Inc., where his primary role is machine learning ensemble construction and optimization. He enjoys data exploration and visualization using tools such as R. His mission is to find hidden patterns in data to explain the past and predict the future. Jason holds a Masters in Management Information Systems from Utah State University.

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

S6
R for Sysadmins
Jason Maughan, PurePredictive, Inc.

Jason Maughan is a predictive data analyst at PurePredictive, Inc., where his primary role is machine learning ensemble construction and optimization. He enjoys data exploration and visualization using tools such as R. His mission is to find hidden patterns in data to explain the past and predict the future. Jason holds a Masters in Management Information Systems from Utah State University.

 
Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

1:30 pm5:00 pm

Grand Ballroom A

We provide a brief introduction to the R programming and statistics language, with a focus on exploratory data analysis for sysadmins. We assume little prior knowledge of statistics and no prior knowledge of the R language or programming environment. The course includes data input, basic manipulation, visualization and plotting, and basic analysis in R. The instructors will be available in the LISA Labs space after the tutorial for attendees that have additional questions or have brought their own data. R (http://www.r-project.org/) and RStudio (http://www.rstudio.com/), an IDE for R, should be installed prior to attending.

Who should attend:

Sysadmins who would like an introduction to R as a tool for gaining additional insight into their wealth of data using R’s statistical and visualization capabilities. We assume little prior knowledge of R or statistics, but basic mathematical proficiency is recommended.

Take back to work:
  • Experience with basic methods and data types in R
  • Experience with basic visualizations in R
  • Basic understanding of data exploration, and analysis to identify patterns in R, such as correlations, regressions, and decision trees (as time allows)
Topics include:
  • An introduction to R and the RStudio programming environment
  • Basic instructions for loading, manipulating, and saving data
  • Basic functions and algorithms for exploring data, and the types of analysis useful for sysadmins
  • An introduction to descriptive statistics for single datasets, including: mean, median, mode, range, and distributions
  • Basic visualizations in R, including histograms, scatterplots, and heatmaps (as time allows)
Additional Materials:

Jean-Charles Lopez, Inktank

Jean-Charles has been an instructor in IT storage environments for the last 15 years. He has been working with equipment from different vendors. He joined Inktank (since then acquired and now a division of Red Hat) in December 2013 as Senior Technical Instructor and head of Inktank University to help the adoption of Ceph in general and Red Hat Inktank Ceph Enterprise (Red Hat ICE) offering in particular.

S7
Introduction to Ceph
Jean-Charles Lopez, Inktank

Jean-Charles has been an instructor in IT storage environments for the last 15 years. He has been working with equipment from different vendors. He joined Inktank (since then acquired and now a division of Red Hat) in December 2013 as Senior Technical Instructor and head of Inktank University to help the adoption of Ceph in general and Red Hat Inktank Ceph Enterprise (Red Hat ICE) offering in particular.

1:30 pm5:00 pm

Cedar Room AB

Learn about the history of Ceph and get to know how Ceph, as Software Defined Storage, uses its different components to offer Object access (S3 and Swift compatible), Block Level access and File Level access. I combine lecture with instructor-led demonstrations using Ceph.

Who should attend:

Any system or storage administrator who wants to get a better and more in depth understanding of Ceph. Any CIO or CTO who would like to know how Ceph could benefit his infrastructures and help contain costs.

Take back to work:
  • Virtual Machine images for playing with Ceph
  • Lab instructions on how to setup and play with a test cluster
Topics include:
  • Overview of the Ceph Architecture
  • Different access methods to the Ceph Storage Cluster
  • Hands-on using Ceph
Additional Materials:
Monday, November 10, 2014

Full Day

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

M1
Linux Performance Tuning
Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am5:00 pm

Grand Ballroom C

The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done ""out of the box"" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workload on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:
  • Strategies for performance tuning
  • Characterizing your workload's requirements
  • Finding bottlenecks
  • Tools for measuring system performance
  • Memory usage tuning
  • Filesystem and storage tuning
  • NFS performance tuning
  • Network tuning
  • Latency vs. throughput
  • Capacity planning
  • Profiling
  • Memory cache and TLB tuning
  • Application tuning strategies
Additional Materials:

Nathen Harvey, Chef

Nathen Harvey is the Director of Community at Chef. Before joining Chef, Nathen spent a number of years managing operations and infrastructure for a number of web applications. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps. He participates in many different meetup groups and organizes the DevOpsDC Meetup Group. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and foursquare.

M3
Testing Your Automation Code
Nathen Harvey, Chef

Nathen Harvey is the Director of Community at Chef. Before joining Chef, Nathen spent a number of years managing operations and infrastructure for a number of web applications. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps. He participates in many different meetup groups and organizes the DevOpsDC Meetup Group. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and foursquare.

9:00 am5:00 pm

Cedar Room AB

This is a hands-on tutorial that will cover the basics that everyone needs to know about how to test your automation code. We’ll start off with a quick introduction to Chef and work our way through writing a fully-tested cookbook or two using linting, unit testing, integration testing, and cross-platform testing.

Hands-on exercises throughout the tutorial will reinforce the material discussed.

Who should attend:

Anyone responsible for managing infrastructure especially those who are interested in automating the provisioning and management of said infrastructure using state-of-the-art tools and practices.

Take back to work:

A working code base that includes samples for building out testable infrastructure components.

Topics include:
  • Introduction to Chef
  • Test-driven Development (TDD) 
  • syntax check - knife cookbook test 
  • linting - foodcritic 
  • unit testing - ChefSpec 
  • integration testing - ServerSpec
Additional Materials:

Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

M4
Hands on Security for System Administrators
Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

9:00 am5:00 pm

Grand Ballroom B

System administrators often run into interesting conflicts between sysadmin, customer, and IT security needs. These conflicts generally lead to difficulty in achieving a balance between administrative convenience, good customer service, and minimal risk. There are processes or services available that can significantly improve any of these areas; however, many times they are costly or resource intensive. This course is designed for system administrators who want to improve the security posture of their organizations, using IT security standards paired with good system administration practices, resulting in better service, lower risk, and minimal impact to time and budget.

We will walk a path, covering many domains of IT security and enterprise system administration in ways that utilize the interconnection between good system administration practices and security fundamentals. We will discuss recent risks and threats, analyze them with respect to your environment, review possible impacts and develop mitigations that you can apply immediately. Training includes instruction, discussion, many hands-on labs, and a strong dose of common sense.

Attendees should bring a laptop capable of running a Virtual Guest and will be provided a VM in which to work. The class will have access to a test network of systems for training and lab exercises.

Who should attend:

Beginning to mid-level system administrators of any stripe with an interest in IT security and a desire to improve their security. It is suggested that participants have experience with the *nix command line.

Take back to work:

A toolbox of documentation, (mostly) free software, and a good starting point for developing better practices to improve security and system administration.

Topics include:
  • The relationship between system administration and IT security
  • Security theories, standards and risk mitigation as applied by SA's
  • Information management using Trac and Subversion
  • Good system administration practices that directly improve IT security
  • Basic configuration driven system management using Puppet
  • Host and network auditing, hardening, and monitoring
  • Developing an effective security awareness program
Additional Materials:

Half Day Morning

Nick Chase, Mirantis

Nick Chase has 20+ years of experience as a developer and author. He has written several books and hundreds of articles as an IBM developerWorks Certified Master Author, founded NoTooMi.com, and has done web application development for companies such as Alcatel-Lucent, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. He is currently Mirantis' Technology Marketing Manager and the Editor of OpenStack:Now.

M2
Understanding OpenStack
Nick Chase, Mirantis

Nick Chase has 20+ years of experience as a developer and author. He has written several books and hundreds of articles as an IBM developerWorks Certified Master Author, founded NoTooMi.com, and has done web application development for companies such as Alcatel-Lucent, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. He is currently Mirantis' Technology Marketing Manager and the Editor of OpenStack:Now.

9:00 am12:30 pm

Grand Ballroom D

This session is designed to give you an understanding of the architecture of OpenStack, how the pieces fit together, and how you can make use of it in your environment. The session provides an understanding of the OpenStack architecture and how to use it. It also includes a hands-on lab during which students will be able to provision and use resources from an existing OpenStack cluster. Students need to bring an internet-capable laptop with a modern browser.

Who should attend:

This tutorial will be best suited for those who are either curious about or considering OpenStack, but don't have a deep understanding of how it works. This can include users from companies who are considering private or public cloud, or who are already virtualized in some capacity, such as using vCenter, or who are simply interested in increasing capacity and speed-to-market.

Take back to work:

Attendees will go back to work with an understanding of how OpenStack fits into a business environment, and a good grounding in how it works and how it can be applied in their own situations.

Topics include:
  • OpenStack architecture and components
  • How requests flow through an OpenStack cluster
  • Virtual machine provisioning and workload scheduling
  • Networking architecture
  • Storage architecture
Additional Materials:

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

M5
Winning the Budget Game: How to Get the Money You Need for IT Every Time
Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

9:00 am12:30 pm

Grand Ballroom A

Getting the resources your team needs is a matter of knowing just enough about finances to communicate what you want, and explaining how it will benefit the company. The trick is to speak their language. Learn the basics of budgeting, benchmarking, resource allocation, cost-benefit analysis, and communicating costs using tools like net present value. Understand the importance of your business cycle and the difference between cap-ex and op-ex. The course will also cover spreadsheet magic, like pivot tables and pivot charts.

Who should attend:

Sysadmins who want to save their IT budgets, and need to learn how to speak “MBA”. This tutorial will help you do just that, arming you with the right approach and the right tools to go to management and communicate your needs and the alternatives available, so that you’re a part of the budget conversation instead of just the recipient of budget decisions.

Take back to work:
  • Understand differences between cap-ex and op-ex, and leverage differences in business cycles and settings.
  • Write a basic capital proposal, including relevant value analysis 
  • Draft a budget, complete with impacts.
  • Compare alternatives and communicate relevant costs.
  • Fundamental understanding of pivot tables and pivot charts.
Topics include:
  • Capital expenses vs. operational expenses: knowing the difference, and knowing why the difference matters
  • Important business cycles and dates
  • Comparing alternatives and communicating relevant costs
  • Cost-benefit analysis
  • Budgeting, including drafting a budget, burn rates, and standard costs and variances
  • Capital planning, including net present value, PV index, internal rate of return, and payback period
  • Pivot tables and pivot charts
Additional Materials:

Half Day Afternoon

John Sellens, SYONEX

John Sellens has been involved in system and network administration for over 25 years, and has been teaching and writing on related topics for many years. He holds an M.Math. in computer science from the University of Waterloo. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the operations team at FreshBooks.

M6
Elasticsearch, Logstash, and Other Data
John Sellens, SYONEX

John Sellens has been involved in system and network administration for over 25 years, and has been teaching and writing on related topics for many years. He holds an M.Math. in computer science from the University of Waterloo. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the operations team at FreshBooks.

1:30 pm5:00 pm

Grand Ballroom D

Elasticsearch is a distrbuted and reliable data store that can be used for a variety of purposes. One use of particular interest to system administrators is as a storage engine for Logstash. This tutorial covers how to implement an Elasticsearch cluster and use Logstash and related tools to store and query log data (syslog, web logs, etc).

Who should attend:

System administrators who need a tool to aggregate and examine log data across their environment.

Take back to work:

Participants will leave the tutorial ready to implement and manage an Elasticsearch cluster, store and analyze their logs and other data with Logstash, and methods for using Elasticsearch with other data.

Topics include:
  • An introduction to Elasticsearch 
  • How to implement and manage a replicated and distributed data store.
  • How to use Logstash to store log files (or other time-stamped data)
  • Tools for querying and analyzing those logs
  • Ways to use and abuse Elasticsearch for other types of data
Additional Materials:

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

M7
Evil Genius 101: Subversive Ways to Promote DevOps and Other Big Changes
Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

1:30 pm5:00 pm

Grand Ballroom A

You want to innovate: deploy new technologies such as configuration management, kanban, a wiki, or standardized configurations. Your coworkers don’t want change: they like the way things are. Therefore, they consider you evil. However you aren’t evil, you just want to make things better. Learn how to talk your team, managers and executives into adopting DevOps techniques and culture.

Who should attend:

Sysadmins and managers looking to influence the technology and culture of your organization.

Take back to work:
  • Help your coworkers understand and agree with your awesome ideas
  • Convince your manager about anything. Really.
  • Get others to trust you so they are more easily convinced
  • Deciding which projects to do when you have more projects than time
  • Turn the most stubborn user into your biggest fan
  • Make decisions based on data and evidence
Topics include:
  • DevOps "value mapping" exercise: Understand how your work relates to business needs.
  • So much to do! What should you do first?
  • How to sell ideas to executives, management, co-workers, and users.
  • Simple ways to display data to get your point across better.
Additional Materials:
Tuesday, November 11, 2014

Full Day

Stuart Kendrick, EMC Isilon

Stuart Kendrick works as a Sustaining Engineer for EMC Isilon. He has worked in software development, help desk, desktop support, system administration, and network support. He spent two decades at the Fred Hutchinson Cancer Research Center in a multi-disciplinary role covering transport, network management, root cause analysis, and Problem Management. He is happiest when correlating packet traces with syslog extracts, writing scripts to query device MIBs, and facilitating hands-on classes at LISA, Cascadia, and Sharkfest.

Chris Shaiman, Watchguard Technologies

Chris Shaiman is the Lead Network Engineer and Security Analyst for WatchGuard Technologies. Previously, he worked at Cingular Wireless and Xerox. Specializing in Networking and Security, he recently acquired his Certified Information Systems Security Professional certificate and has been enjoying building a more robust security program at his current employer. When not focusing on his love for networking and security related sciences, he is working on completing his hot yoga teacher training, and bioponic research.

T1
Myth-Busting: The Network Layer
Stuart Kendrick, EMC Isilon

Stuart Kendrick works as a Sustaining Engineer for EMC Isilon. He has worked in software development, help desk, desktop support, system administration, and network support. He spent two decades at the Fred Hutchinson Cancer Research Center in a multi-disciplinary role covering transport, network management, root cause analysis, and Problem Management. He is happiest when correlating packet traces with syslog extracts, writing scripts to query device MIBs, and facilitating hands-on classes at LISA, Cascadia, and Sharkfest.

 
Chris Shaiman, Watchguard Technologies

Chris Shaiman is the Lead Network Engineer and Security Analyst for WatchGuard Technologies. Previously, he worked at Cingular Wireless and Xerox. Specializing in Networking and Security, he recently acquired his Certified Information Systems Security Professional certificate and has been enjoying building a more robust security program at his current employer. When not focusing on his love for networking and security related sciences, he is working on completing his hot yoga teacher training, and bioponic research.

9:00 am5:00 pm

Willow Room A

Our industry has its share of urban legends. In this tutorial, we split into teams, pick an adventure, and practice a methodology to separate myth from reality. We then come together as a class, listen to each other's reports, discuss what is surprising and how the underlying technology actually works. By the end of the day, you've repeated this cycle several times, have experience employing the methodology, and are ready to apply these techniques back at the office. You bring basic equipment and complete your homework prior to class; we provide test environments and coaching.

Who should attend:

Sys admins and network engineers involved in designing operational IT environments or trouble-shooting client/server problems … and who enjoy discovering that sometimes what we believe just ain’t so.

Take back to work:

Practice using a methodology for sanity-checking performance claims, a deeper understanding of popular technologies, and experience working in ad-hoc teams.

Topics include:

Your team will pick several adventues during the course of the day:

  • Jumbo Frames: Do they make your server go faster
  • Packet loss: When does it start to matter
  • Hops: How many switches does it take to slow down your server
  • Contention: When the pipe is full, how much of a difference does it make
  • Firewalls: The bad boys of the network -- how much do they skim off the top
  • WAN Compression: How much does this buy us
Additional Materials:

Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

T2
Defending Against the Dark Arts
Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

9:00 am5:00 pm

Grand Ballroom C

Today's threats to the enterprise are manifested in many ways but all share similar traits: highly intelligent, well-funded and determined to gain access. In this class, we will explore the murky world of the black-hats. We will examine your security foot-print as they view it, and discuss ways to minimize it, various vectors for attack, and how to detect and defend. We will spend time talking about current threats, and how they can impact your company, and we will build upon the foundations of good security practice. As with all my classes, this will be accompanied with a pinch of humor and a large dollop of common sense.

Who should attend:

Participants should be beginning to mid-level system administrators of any stripe with an interest in IT Security and a desire to understand their potential adversaries. It is suggested that participants have experience with *nix command line and virtual hosts. This course is a continuation of, and will build on, what is taught in Hands On Security for Sysadmins. However, attending that course is not a prerequisite.

Take back to work:

Tools, tips, tricks and a working security toolkit which can be implemented to improve monitoring, detection and defense in your organization. Experience working with (mostly) free security software tools.

Topics include:
  • Understanding an attack from beginning to end
  • Security Podiatry Social Engineering detection and prevention
  • Detecting undesirable network and server behavior
  • Detecting penetration attempts and successes, and stopping them
  • Raising awareness
Additional Materials:

Half Day Morning

Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

T3
SDN Essentials I
Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

 
Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

9:00 am12:30 pm

Grand Ballroom B

This course will introduce the attendee to a disruptive transformation underway in networking. S/he will learn the fundamentals of Software-Defined Networking (SDN) including its motivation, basic building blocks and standard abstractions. Bring your laptop to participate in exercises.

Who should attend:

Any attendee of LISA’14 who is interested in learning about how SDN and OpenFlow present major transformative opportunities to networking professionals.

Take back to work:

The fundamental understanding of Software-Defined Networking (SDN) including its motivation, basic building blocks and standard abstractions.

Topics include:
  • Introduction to SDN
  • What is SDN? 
  • SDN Architecture
  • Benefits of SDN
  • The SDN Stack
  • SDN Use Cases and Early Deployments
  • Wrap-Up: Thinking Differently about Networking
Additional Materials:

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

T4
Build a Sysadmin Sandbox
David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

9:00 am12:30 pm

Grand Ballroom A

Every sysadmin I know has at least four new technologies they would like to play with—and that's just before breakfast. But finding the time to do this can be difficult, often because of the effort it takes to find a machine, set up the network connection, firewall it from the other machines, and so on... If you want to try something that requires multiple components, that's even worse.

What you need is a sandbox: a place where you can experiment with new technologies, mess with cool tools, test the latest versions of software, prototype new services, and so on, without any sort of heavyweight effort.

Who should attend:

Sysadmins who want to test technologies safely and efficiently.

Take back to work:

The ability to build your own sysadmin sandboxes.

Topics include:
  • Using software and services like Git, Vagrant, Amazon EC2, and Docker to make sandbox construction easy and free/inexpensive
  • When a sandbox approach is and is not appropriate
  • Migrating your work in and out of sandboxes
  • How to share the work you've done in a sandbox with others
Additional Materials:

Carolyn Rowland, National Institute of Standards and Technology (NIST)

Carolyn Rowland began her UNIX system administration career in 1991 and currently leads an ops/dev team at the National Institute of Standards and Technology (NIST). She credits her success with being able to be the bridge between senior management and technology. Her team has distinguished itself as a leader in the development of new technology solutions that solve business and research problems within the Engineering Laboratory and across the NIST campus.

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

T5
Navigating the Business World
Carolyn Rowland, National Institute of Standards and Technology (NIST)

Carolyn Rowland began her UNIX system administration career in 1991 and currently leads an ops/dev team at the National Institute of Standards and Technology (NIST). She credits her success with being able to be the bridge between senior management and technology. Her team has distinguished itself as a leader in the development of new technology solutions that solve business and research problems within the Engineering Laboratory and across the NIST campus.

 
Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

9:00 am12:30 pm

Grand Ballroom D

This tutorial provides guidelines and suggestions to improve the communication and collaboration of sysadmins and directly increase effectiveness when working with senior management.

We will provide you with skills that are associated with senior IT operations staff. The focus is on understanding the gap between IT operations and management and how to bridge that gap to increase respect for IT operations and improve your relationship with management. We will cover identifying your role within the organization, effective communication with upper management, positioning yourself to increase your visibility, and becoming the go-to person/group for IT within your organization. The instructors make this an interactive class by buliding on real situations of attendees, including specific ideas to help address immediate issues.

Who should attend:

Technical staff who want or need to work more effectively with senior management.

Take back to work:

The ability to position IT operations to be a valuable partner in your organization. Expect to think differently about IT operations and business when you leave the class.

Topics include:
  • The divide between management and IT operations
  • Challenges that impact IT operations teams
  • Ways to change the perception of IT operations in your organization
  • The importance of culture
  • How DevOps lessons translate to business
Additional Materials:

Half Day Afternoon

Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

T6
SDN Essentials II
Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

 
Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

1:30 pm5:00 pm

Grand Ballroom B

This course will build on the morning tutorial, and use hands-on exercises to delve into the details of the various components of the SDN stack and OpenFlow API. Bring your laptop to participate in exercises.

Who should attend:

DevOps Engineer, Network Admin, Network Planners, Architect, and Software Engineer, who desire a deeper dive into the "how" of SDN.

Take back to work:

Experience with tools and simulation environment to start exploring SDN technologies.

Topics include:
  • Review of SDN and Architecture
  • Components of SDN Stack
  • Intro to OpenFlow 
  • Play with OpenFlow (Hands-on exercise)
  • The SDN Stack: Switches
  • The SDN Stack: Network Operating System/Controllers
  • Build a Learning Switch (Hands-on exercise)
  • Network Functions Virtualization (NFV)
  • SDN - A Paradigm Shift
Additional Materials:

Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

T7
A Quick Introduction to System Tools Programming with Go
Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

1:30 pm5:00 pm

Grand Ballroom D

This tutorial provides an introduction to Go with a focus on using it for everyday tooling.

Who should attend:

Sysadmins who want to develop their own tooling, or have a general interest in the Go programming language.

Take back to work:

Ability to read and write the Go language. Create, understand, and extend tools written in Go.

Topics include:
  • Understanding of the applicability of Go to everyday work
  • The Go environment and toolset
  • Go language fundamentals:
  • Control Structures
  • Functions
  • Data Types
  • Methods and Interfaces
  • Goroutines
  • Channels
Additional Materials:

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

T8
How To Not Get Paged: Managing Oncall to Reduce Outages
Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

1:30 pm5:00 pm

Grand Ballroom A

People think of “oncall” as responding to a pager that beeps because of an outage. In this class you will learn how to use oncall as a vehicle to improve system reliability so that you get paged less often.

This talk includes never-before seen material from the new book, “The Practice of Cloud System Administration” by Limoncelli, Chalup, Hogan.

Who should attend:

Anyone with an oncall responsibility (or their manager).

Take back to work:
  • How to monitor more accurately so you get paged less
  • How to design an oncall schedule so that it is more fair and less stressful
  • How to assure preventative work and long-term solutions get done between oncall shifts
  • How to conduct “Fire Drills” and “Game Day Exercises” to create antifragile systems
  • How to write a good Post-mortem document that communicates better and prevents future problems
Topics include:
  • Why your monitoring strategy is broken and how to fix it
  • Building a more fair oncall schedule
  • Monitoring to detect outages vs. monitoring to improve reliability
  • Alert review strategies
  • Conducting “Fire Drills” and “Game Day Exercises”
  • "Blameless Post-mortem documents"
Additional Materials:
Monday, February 16, 2015

Half Day Morning

Eno Thereska, Microsoft Research

Eno Thereska is a Researcher at Microsoft Research in Cambridge, UK. He has broad interests in computer systems. He has over 30 academic publications in top conferences in the field of storage systems and operating systems, including FAST, OSDI, SOSP, SIGMETRICS and CHI. He served as technical co-chair of the File and Storage Systems Conference (FAST '14). Eno is a recipient of the 2014 IEEE William R. Bennett Prize, recipient of the IEEE Infocomm 2011 Best Paper award, and recipient of the USENIX FAST Best Paper and Best Student Paper awards in 2005 and 2004 respectively. He graduated with a Ph.D. from Carnegie Mellon University in 2007.

Greg O'Shea, Microsoft Research

Greg O'Shea is a software engineer in the Systems and Networking group at Microsoft Research, Cambridge, UK. He has worked extensively in developing and evaluating experimental network and storage systems and has published his findings extensively in SIGCOMM, NSDI, MobiCom, OSDI, and SOSP. Greg’s work has been incorporated into several Microsoft products such as Windows, Hyper-V, and Windows Server. His latest work is on Storage Quality of Service and is included in Windows Server Technical Preview. He has also developed the Microsoft Research Storage Toolkit, a development kit for software-defined storage. Greg has a Ph.D. from London University.

M1
Software-defined Storage from an SDN Perspective
Eno Thereska, Microsoft Research

Eno Thereska is a Researcher at Microsoft Research in Cambridge, UK. He has broad interests in computer systems. He has over 30 academic publications in top conferences in the field of storage systems and operating systems, including FAST, OSDI, SOSP, SIGMETRICS and CHI. He served as technical co-chair of the File and Storage Systems Conference (FAST '14). Eno is a recipient of the 2014 IEEE William R. Bennett Prize, recipient of the IEEE Infocomm 2011 Best Paper award, and recipient of the USENIX FAST Best Paper and Best Student Paper awards in 2005 and 2004 respectively. He graduated with a Ph.D. from Carnegie Mellon University in 2007.

 
Greg O'Shea, Microsoft Research

Greg O'Shea is a software engineer in the Systems and Networking group at Microsoft Research, Cambridge, UK. He has worked extensively in developing and evaluating experimental network and storage systems and has published his findings extensively in SIGCOMM, NSDI, MobiCom, OSDI, and SOSP. Greg’s work has been incorporated into several Microsoft products such as Windows, Hyper-V, and Windows Server. His latest work is on Storage Quality of Service and is included in Windows Server Technical Preview. He has also developed the Microsoft Research Storage Toolkit, a development kit for software-defined storage. Greg has a Ph.D. from London University.

9:00 am12:30 pm

Grand Ballroom B

This tutorial will provide technical background on the (often-vague) concept of software-defined storage (SDS). The technical contribution of this tutorial is around a definition of SDS that builds on recent work in network systems and applies it to storage. This work includes basic concepts such as classification, routing and forwarding, and the separation of control, and data planes. Surprisingly, these basic concepts do not apply well to the storage stack today, making it difficult to enforce end-to-end storage policies.

There will be a short, hands-on exercise that requires Windows 8.1.

Download the presentation slides (PPTX) for this tutorial.

 

 

Additional Materials:

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

M2
Clustered and Parallel Storage System Technologies
Brent Welch, Google

Brent Welch is a senior staff software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

9:00 am12:30 pm

Grand Ballroom C

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Lustre, GPFS, PanFS, HDFS (Hadoop File System), OpenStack, and the NFSv4.1 standard for parallel I/O.

Additional Materials:

Half Day Afternoon

Sam H. Noh, Hongik University

Sam H.(Hyuk) Noh received his B.S. in Computer Engineering from the Seoul National University in 1986, and his Ph.D. from the Department of Computer Science, University of Maryland, College Park in 1993. He has been a professor at the School of Computer and Information Engineering at Hongik University in Seoul, Korea since 1994. He has worked on various software issues pertaining to flash memory since 1999, having authored numerous papers and holding numerous patents in that area. He has served as General Chair, Program Chair, and Program Committee Member for a number of technical conferences and workshops including the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), IEEE International Conference on Parallel and Distributed Systems (ICPADS), USENIX Conference on File and Storage Technologies (FAST), and International World Wide Web (WWW) Conference. He also serves as Associate Editor of ACM Transactions on Storage. His other current research interests include operating system issues pertaining to non-volatile memory, such as PCM and STT-MRAM.

Yang-Suk Kee, Samsung Semiconductor, Inc.

Dr. Yang-Suk Kee (Yang Seok Ki) is a director and architect of Memory Solutions Lab, Samsung Semiconductor Inc. America. He leads the Advanced Datacenter Solutions group whose main focus is to innovate the SSD ecosystem, and drives astorage-centric computing paradigm called Smart SSD. Before joining Samsung, he worked for Oracle server technology group that builds a distributed database server system, and contributed to Oracle 12c release. Prior to his industrial experience, he worked on HPDC (High Performance Distributed Computing), Grid, and Cloud research in Information Sciences Institute of University of Southern California and Center of Networked Systems, University of California, San Diego. He received his Ph.D. degree of Electrical Engineering and Computer Science in parallel processing, his M.S. degree of Computer Engineering, and B.S. degree of Computer Engineering from Seoul National University, Korea.

M3
Flash Memory and Its By-products: A to Z in a Flash
Sam H. Noh, Hongik University

Sam H.(Hyuk) Noh received his B.S. in Computer Engineering from the Seoul National University in 1986, and his Ph.D. from the Department of Computer Science, University of Maryland, College Park in 1993. He has been a professor at the School of Computer and Information Engineering at Hongik University in Seoul, Korea since 1994. He has worked on various software issues pertaining to flash memory since 1999, having authored numerous papers and holding numerous patents in that area. He has served as General Chair, Program Chair, and Program Committee Member for a number of technical conferences and workshops including the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), IEEE International Conference on Parallel and Distributed Systems (ICPADS), USENIX Conference on File and Storage Technologies (FAST), and International World Wide Web (WWW) Conference. He also serves as Associate Editor of ACM Transactions on Storage. His other current research interests include operating system issues pertaining to non-volatile memory, such as PCM and STT-MRAM.

 
Yang-Suk Kee, Samsung Semiconductor, Inc.

Dr. Yang-Suk Kee (Yang Seok Ki) is a director and architect of Memory Solutions Lab, Samsung Semiconductor Inc. America. He leads the Advanced Datacenter Solutions group whose main focus is to innovate the SSD ecosystem, and drives astorage-centric computing paradigm called Smart SSD. Before joining Samsung, he worked for Oracle server technology group that builds a distributed database server system, and contributed to Oracle 12c release. Prior to his industrial experience, he worked on HPDC (High Performance Distributed Computing), Grid, and Cloud research in Information Sciences Institute of University of Southern California and Center of Networked Systems, University of California, San Diego. He received his Ph.D. degree of Electrical Engineering and Computer Science in parallel processing, his M.S. degree of Computer Engineering, and B.S. degree of Computer Engineering from Seoul National University, Korea.

1:30 pm5:00 pm

Grand Ballroom B

This tutorial will be a crash course on flash memory. We will cover the major ground related to flash memory-based products, starting from the intrinsic characteristics of flash memory devices, moving up to the FTL firmware that controls the flash memory devices, and then finally up to the system software layer that makes use of these flash memory-based end products. We start off covering the history and the very basics of each layer. We then discuss the recent trends that are happening in each of the layers. We will also discuss how each of the layers differ for the various flash products that are commercially available. We will also attempt to untangle the close-knit relationship among the system, software, interface, and the market that together results in the flash memory-based end products and the software systems that make use of these end products.

Additional Materials:
M4
Architecting Applications on Hadoop
1:30 pm5:00 pm

Grand Ballroom C

During the first half of the tutorial, we will provide an intro to Apache Hadoop and the ecosystem. In the second half, we will show, using an end-to-end application of clickstream analytics, how users can:

  • Model data in Hadoop, select optimal storage formats for data stored in Hadoop
  • Move data between Hadoop and external systems such as relational databases and logs
  • Access and process data in Hadoop
  • Orchestrate and scheduling workflows on Hadoop

Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered.

Students should bring laptops with a copy of the of the Cloudera Quickstart VM (or access to a working alternate VM or Hadoop cluster). The VM can be downloaded from here.

Requirements are:
These are a 64-bit VMs. They requires a 64-bit host OS and a virtualization product that can support a 64-bit guest OS.
To use a VMware VM, you must use a player compatible with WorkStation 8.x or higher: Player 4.x or higher, ESXi 5.x or higher, or Fusion 4.x or higher. Older versions of WorkStation can be used to create a new VM using the same virtual disk (VMDK file), but some features in VMware Tools won't be available.

The VM and file size vary according to the CDH version as follows:

CDH and Cloudera Manager Version RAM Required by VM File Size
CDH 5 and Cloudera Manager 5 4 GB 3 GB
CDH 4, Cloudera Impala, Cloudera Search, and Cloudera Manager 4 4 GB 2 GB
Additional Materials:
Sunday, November 8, 2015

Full Day

Salim Virji, Google

Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and various pieces of glue to connect them all. He lives and works in New York City.

S1
SRE Classroom
Salim Virji, Google

Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and various pieces of glue to connect them all. He lives and works in New York City.

9:00 am5:00 pm

Thurgood Marshall West

With this hands-on tutorial, you will develop an understanding for designing, building, and running reliable Internet services at a large scale.

Who should attend:

This tutorial is suitable for executives who need to specify and evaluate systems, engineers who build systems, and IT professionals who want to run first-class services built with reliable systems.

Take back to work:

You will take back an understanding of how to evaluate system designs; how to specify and build large systems; and how to operate these systems in the real world, in a way that will scale as the system grows.

Topics include:
  • Designing Reliable Systems
  • Building Reliable Systems
  • Running Reliable Systems
Additional Materials:

You may want a text editor and a calculator, but most of this work can be done with pencil and paper.

Tyler Fitch, Chef

Tyler is a member of Chef’s Customer Engineering team, championing successful patterns and delightful experiences in automation to enterprise customers large and small. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

S3
Automation Tools Bootcamp
Tyler Fitch, Chef

Tyler is a member of Chef’s Customer Engineering team, championing successful patterns and delightful experiences in automation to enterprise customers large and small. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

9:00 am5:00 pm

Thurgood Marshall South

The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new development tools. We will learn about and then use Git, Vagrant, Chef, Packer, Docker, and Jenkins to build a small application in local VMs.

We will cover a progression of tasks, leveraging information from previous sections to build a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.

Who should attend:

Operations, QA, developers, those who choose to call themselves DevOps, and even managers can come learn.

Take back to work:

These automation tools are freely available to engineers, enabling them to safely break environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.

Topics include:
  • Git
  • Vagrant
  • Chef
  • Packer
  • Docker
  • Jenkins
Additional Materials:

Having Vagrant and VirtualBox installed will allow students to use a VM that will be provided. The VM will have everything else pre-installed. The VM will be available for download before or after the class. URL TBA.

Half Day Morning

Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

S2
Go for Sysadmins
Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

9:00 am12:30 pm

Thurgood Marshall North

Every day, sysadmins find the need to build tools that get the job done in a fast and efficient way. Go is a new and powerful—yet simple—language that is an excellent fit for the needs of the harried sysadmin. This tutorial provides an introduction to Go with a focus on using it for everday tooling.

Who should attend:

Sysadmins interested in the Go language for applications and tooling. Some experience with programming or scripting languages such as Perl, Python, or Ruby.

Take back to work:

An introductory understanding of the Go language, and several examples of common problems to learn from.

Topics include:
  • Understanding of the applicability of Go to everyday work
  • The Go environment and toolset
  • Go language fundamentals:
  • Control Structures
  • Functions
  • Data Types
  • Methods and Interfaces
  • Goroutines
  • Channels
  • Examples in Go:
  • Email
  • Web Server
  • Directory Tools
Additional Materials:

A current Linux VM installed with golang from is recommended. See http://golang.org/doc/install. However, exercises can be worked on by using the packages from this site directly.

Rajesh Joseph, Red Hat, Inc.

Rajesh Joseph is a Principal Software Engineer at Red Hat. He is currently working with the Protocols Team for Red Hat Gluster Storage (RHGS) that is responsible for implementation of the SMB and NFS family of protocols for RHGS. He is also the maintainer of the Snapshot feature in GlusterFS. He has 13 years of industry experience in information security, cryptography, storage, and file-system domain. Prior to Red Hat, he worked with Cisco and Nokia.

Poornima Gurusiddaiah, Red Hat, Inc.

Poornima Gurusiddaiah works as a software engineer at Red Hat. With two years of development experience with Gluster, she has contributed to various Gluster projects including Samba Gluster integration, libgfapi, and multiprotocol for GlusterFS. Prior to Red Hat, she worked with HP on an openVMS kernel project. Poornima's areas of interest include Gluster, storage, Samba and related technologies.

S4
GlusterFS: A Scale-out Software Defined Storage
Rajesh Joseph, Red Hat, Inc.

Rajesh Joseph is a Principal Software Engineer at Red Hat. He is currently working with the Protocols Team for Red Hat Gluster Storage (RHGS) that is responsible for implementation of the SMB and NFS family of protocols for RHGS. He is also the maintainer of the Snapshot feature in GlusterFS. He has 13 years of industry experience in information security, cryptography, storage, and file-system domain. Prior to Red Hat, he worked with Cisco and Nokia.

 
Poornima Gurusiddaiah, Red Hat, Inc.

Poornima Gurusiddaiah works as a software engineer at Red Hat. With two years of development experience with Gluster, she has contributed to various Gluster projects including Samba Gluster integration, libgfapi, and multiprotocol for GlusterFS. Prior to Red Hat, she worked with HP on an openVMS kernel project. Poornima's areas of interest include Gluster, storage, Samba and related technologies.

9:00 am12:30 pm

Thurgood Marshall East

Learn about GlusterFS, a scale-out, software-defined storage platform capable of scaling to many petabytes. GlusterFS aggregates storage building blocks on diverse commodity hardware over InfiniBand RDMA or TCP/IP interconnect in a single global namespace. GlusterFS is based on a stackable user space design and can deliver exceptional flexibility and performance for diverse workloads.

This tutorial will combine lecture with instructor-led demonstrations using GlusterFS.

Who should attend:
  • Any system/storage administrator who wants to get in-depth understanding of GlusterFS. 
  • Anyone who wants to know how GlusterFS can help their storage infrastructure and contain costs.
Take back to work:
  • Thorough understanding of GlusterFS
  • Virtual Machine images for playing with GlusterFS
  • Lab instructions on how to set up and play with a test cluster
Topics include:
  • Overview of the GlusterFS architecture
  • Features offered by GlusterFS
  • Different access methods to the GlutsterFS (Samba, NFS, FUSE)
  • Hands-on using GlusterFS
Additional Materials:

Lee Damon, University of Washington

Lee Damon chaired the SAGE policies and ethics working groups that developed the original SAGE/LOPSA code of ethics. He has a B.S. in Speech Communication from Oregon State University. A UNIX system administrator since 1985, he has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and Qualcomm. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. He chaired LISA '04 and co-chaired CasITconf '11, '13, and '14.

S5
Professional Conduct and Computer Ethics
Lee Damon, University of Washington

Lee Damon chaired the SAGE policies and ethics working groups that developed the original SAGE/LOPSA code of ethics. He has a B.S. in Speech Communication from Oregon State University. A UNIX system administrator since 1985, he has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and Qualcomm. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. He chaired LISA '04 and co-chaired CasITconf '11, '13, and '14.

9:00 am12:30 pm

Lincoln 4

This introductory tutorial will start by examining some of the ethical responsibilities that come along with access to other users' data, accounts, and confidential information. We will look at several case studies involving both local and Cloud usage. All attendees are strongly encouraged to participate in the discussion. Numerous viewpoints will be considered in order to give students a perspective from which to develop their own reasoned response to ethical challenges.

The answers to these and other problems are often far more complicated than one would initially guess.

There are no prerequisites for this class.

Who should attend:

People in any of the following situations would benefit from this tutorial:

  • system administrators
  • managers of system administrators
  • those who have access to personal/confidential information
  • those who make policy decisions about computer systems and their users.

Take back to work:

After completing this tutorial, students will be better able to resolve questionable situations and will have the means to support their decisions.

Topics include:
  • Why it is important to set your ethical standards before issues arise
  • Who is impacted by "expectations of ethical conduct"
  • Why this isn't just an expectation of system administrators
  • Implicit expectations of ethical behavior
  • Ethics and the Cloud
  • Coercion to violate ethics
  • Well-intentioned violations of privacy
  • Collection, retention, and protection of personal data
  • Management directives vs. friendships
  • Software piracy/copying in a company/group/department
Additional Materials:

Half Day Afternoon

David Josephsen, Librato

As the developer evangelist for Librato, Dave Josephsen hacks on tools and documentation, writes about statistics, systems monitoring, alerting, metrics collection and visualization, and generally does anything he can to help engineers and developers close the feedback loop in their systems. He’s written books for Prentice Hall and O’Reilly, speaks Go, C, Python, Perl and a little bit of Spanish, and has never lost a game of Calvinball.

S6
Neighborly Nagios
David Josephsen, Librato

As the developer evangelist for Librato, Dave Josephsen hacks on tools and documentation, writes about statistics, systems monitoring, alerting, metrics collection and visualization, and generally does anything he can to help engineers and developers close the feedback loop in their systems. He’s written books for Prentice Hall and O’Reilly, speaks Go, C, Python, Perl and a little bit of Spanish, and has never lost a game of Calvinball.

1:30 pm5:00 pm

Thurgood Marshall East

Neighborly Nagios is a beginner-level tutorial on the venerable monitoring system. Rather than teaching Nagios in a vacuum, this tutorial emphasizes the important role played by centralized, polling-based monitoring systems like Nagios in the context of a modern engineering organization. We will focus on the hooks and I/O intricacies systems engineers need to rely on in order to make Nagios play nicely with other commonly used monitoring tools.

Who should attend:

Sysadmins interested in learning about Nagios.

Take back to work:

Attendees will leave with a solid conceptual understanding of Nagios, as well as centralized, polling-based monitoring systems in general. They will have configured Nagios to run local and remote service, and host checks, as well as perform email and webhook notifications. Attendees will also configure Nagios to work with several visualization systems, and to emit logs into a centralized event-logging infrastructure.

Topics include:

Why learn Nagios?

  • Centralized pollers in theory and practice

Reality through the eyes of Nagios

  • Hosts, services, and plugins
  • Server-side vs. client-side plugins and remote execution
  • Active polling vs. passive checks
  • People (contacts, contact groups, and notification commands)

Installing and configuration

  • Nagios configuration layout and templating
  • Macros and commands
  • Templates for hosts, services, and contacts
  • A simple template setup using hostgroups
  • A flexible template setup
  • Managing your configs

Data-In: Plugins and data collection frameworks

  • Three simple, server-side plugins
  • NRPE
  • NRDP
  • Check_MK

Data-Out: Hooks and data export strategies

  • Performance Data and Time Series
  • Nagios Event Radio Dispatch (NERD)
  • PNP4Nagios and RRDTool
  • Graphios and Graphite
  • MK LiveStatus

Neighborly Nagios: One part of an effective monitoring infrastructure

  • Data as a first-class citizen
  • Composable telemetry processing infrastructure
  • Intro to Statsd
  • Intro to Heka
  • Intro to Reimann
Additional Materials:

Attendees are encouraged to have a laptop with Vagrant installed.

Adam Moskowitz, MathWorks

Adam Moskowitz is a Senior Tools Engineer at MathWorks and was the program chair of the LISA ‘09 conference. Since entering the field in 1978, he has been a computer operator (what sysadmins were called in The Good Old Days), an application developer, a system administrator, and a teacher of all of those things. At MathWorks, he is building a system to manage containers in both production and ephemeral testing environments (and to quietly take over the world). When he’s not writing code, you will find him judging barbecue contests or riding a tandem bicycle with his wife.

S7
Software Testing for Sysadmin Programs
Adam Moskowitz, MathWorks

Adam Moskowitz is a Senior Tools Engineer at MathWorks and was the program chair of the LISA ‘09 conference. Since entering the field in 1978, he has been a computer operator (what sysadmins were called in The Good Old Days), an application developer, a system administrator, and a teacher of all of those things. At MathWorks, he is building a system to manage containers in both production and ephemeral testing environments (and to quietly take over the world). When he’s not writing code, you will find him judging barbecue contests or riding a tandem bicycle with his wife.

1:30 pm5:00 pm

Thurgood Marshall North

This tutorial will focus on how to incorporate testing into the software that sysadmins develop. Programs typically written by sysadmins tend to be less amenable to testing for a variety of reasons: choice of language, lack of "OO-ness," and heavy interaction with "system programs" (things like "mount" and "df"). This class will address these issues and show ways to incorporate testing into existing programs without having to rewrite them; it will also demonstrate how to use testing when developing new programs when "going OO" isn't an option. The session will consist of lecture, discussion, and hands-on exercises.

Who should attend:

System administrators of any level with at least some experience writing programs in pretty much any scripting language language (Python, Ruby, Perl, Bash, or Groovy).

Take back to work:
  • An understanding of software testing and test-driven development, and how to incorporate these concepts into system administration
  • Suggestions for making future programs more testable
  • Exposure to a testing framework (Maven plus Spock)
  • Experience developing basic unit tests
  • Some exposure to simple "mocking" techniques
Topics include:

Motivation and benefits

  • Why use testing
  • Why testing is needed
  • How it can help
  • Why it's not as difficult as people think

Basic testing

  • Intro to Maven (the test "runner")
  • Intro to Spock (the test framework)
  • Intro to Groovy (the testing language)
  • "Mocks" (or "Mocking")

  • How to simulate interaction with other programs
  • Additional Materials:

    Mike Ciavarella, Coffee Bean Software Pty Ltd

    Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

    S8
    Documentation Techniques for Sysadmins
    Mike Ciavarella, Coffee Bean Software Pty Ltd

    Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

    1:30 pm5:00 pm

    Lincoln 4

    Sysadmins freely acknowledge how important documentation is to their daily lives, and in the same sentence will loudly complain that they don’t have time to produce documentation. This class is about how to produce effective, useful and timely documentation as part of your normal sysadmin activities. Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

    Who should attend:

    System administrators of all types and levels who need to produce documention for the systems they manage, or who want to improve their documentation skills. Documentation can be the difference that turns you from a good sysadmin to a great sysadmin!

    Take back to work:
    • The skills to improve personal and team documentation quality
    • A solid understanding of how to establish and maintain effective documentation practices
    Topics include:
    • Why system administrators need to document
    • Documentation as part of your daily workflow
    • Targeting your audience
    • Common mistakes made in documentation
    • Tools to assist the documentation process (including effective use of wikis)
    Additional Materials:
    Monday, November 9, 2015

    Full Day

    Caskey L. Dickson

    Caskey Dickson is an engineer with an MBA. Currently, he is a Site Reliability Engineer/Software Engineer at Google, where he works writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines.

    Caskey has worked in online service development since 1995. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

    M1
    Everything an Administrator Wanted to Know about Operating System Internals but Was Afraid to Ask
    Caskey L. Dickson

    Caskey Dickson is an engineer with an MBA. Currently, he is a Site Reliability Engineer/Software Engineer at Google, where he works writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. Caskey has worked in online service development since 1995. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

    9:00 am5:00 pm

    Thurgood Marshall South

    The Morning Session (Part 1) of this tutorial is FULL.

    Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "It's sometimes slow."

    This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis. As an added bonus, it will prime you for the following day's "Linux Performance Tuning" tutorial with Theodore Ts'o.

    You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.

    Buckle up.

    Who should attend:
    • All admins who did not take the Comp-Sci academic route and never had a course in OS internals
    • Inexperienced admins whose coursework or training didn't include the kind of OS internals that they should (modern OS courses have become a shadow of their former selves and commonly require writing no OS code)
    • More experienced admins who haven't really had to address these sorts of issues on a regular basis who probably know a lot about some individual aspects but could benefit from everything being put into a broader context
    Take back to work:

    Attendees will gain a deeper understanding of what goes on inside the kernel and the areas where things can go wrong. We'll explore how little the concept of "system load" captures about the true system state, and attendees will be prepared to improve both their operational response methodologies as well as their monitoring goals.

    Topics include:

    Morning:

    • Scheduling and Process Management
    • Memory Management and the MMU
    • Virtualization and its impact on these

    Afternoon:

    • File System Architecture (for sysadmins covering ext (2, 3, 4), NTFS, and ZFS)
    • Storage layer performance, disks, RAID, and SANs
    • The impact of virtualization on these
    Additional Materials:

    All attendees should have either a laptop running Linux, or a VM with a recent copy of Linux installed to assist in the hands-on exploration activities.

    Branson Matheson, sandSecurity

    Branson is a 27-year veteran of system architecture, administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines. He has also run his own company while continuing to support many open source projects. He is currently the CIO for Silent Circle, a privacy and security organization behind the Blackphone. Branson has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

    M3
    Defending against the Dark Arts
    Branson Matheson, sandSecurity

    Branson is a 27-year veteran of system architecture, administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines. He has also run his own company while continuing to support many open source projects. He is currently the CIO for Silent Circle, a privacy and security organization behind the Blackphone. Branson has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

    9:00 am5:00 pm

    Thurgood Marshall North

    Today's threats to the enterprise are manifested in many ways, but all share similar traits: highly intelligent, well funded and determined to gain access. In this class, we will explore the murky world of the blackhats. We will examine your security footprint as they view it, and discuss ways to minimize it, various vectors for attack, and how to detect and defend. We will spend time talking about current threats and how they can impact your company, and we will build upon the foundations of good security practice. As with all my classes, this will be accompanied with a pinch of humor and a large dollop of common sense.

    Who should attend:

    Beginning to high-level system administrators of any stripe with an interest in IT Security and a desire to understand their potential adversaries. It is suggested that participants have experience with *nix command line and virtual hosts.

    Take back to work:

    Tools, tips, tricks and a working security toolkit that can be implemented to improve monitoring, detection, and defense in your organization. Experience working with (mostly) free security software tools.

    Topics include:
    • Understanding an attack from beginning to end
    • Security Podiatry Social Engineering detection and prevention
    • Detecting undesirable network and server behavior
    • Detecting penetration attempts and successes, and stopping them
    • Raising awareness
    Additional Materials:

    VMware Fusion or VirtualBox. You should have wireless or wired connectivity; we will have both available in the class. You will need to be able to acquire the VM for the class prior; registered attendees will receive the download information via email.

    Half Day Morning

    Robert Ballance, Independent

    Dr. Robert Ballance honed his R-programming skills as a System Manager for High-Performance Computing Systems at Sandia National Laboratories, where he developed several R packages for system analysis and reporting. He has administered systems and managed, consulted, taught, and developed software, including R packages, PERL applications, compilers, and UNIX device drivers. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. 

    M2
    An Introduction to R for System Administrators
    Robert Ballance, Independent

    Dr. Robert Ballance honed his R-programming skills as a System Manager for High-Performance Computing Systems at Sandia National Laboratories, where he developed several R packages for system analysis and reporting. He has administered systems and managed, consulted, taught, and developed software, including R packages, PERL applications, compilers, and UNIX device drivers. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. 

    9:00 am12:30 pm

    Thurgood Marshall West

    The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for ensuring reproducible and repeatable results.

    This tutorial is designed to

    • motivate you to pick up R,
    • demonstrate useful techniques using R, 
    • illustrate ways to simplify your life by automating data analysis and reporting 

    Examples will be based on situations that the instructor encountered during routine system operations. Additional exercises and data sets that students can explore following the workshop will be provided. The instructor will be available in the LISA Lab after the workshop. 

    Who should attend:

    System administrators who are awash in operational data and want to do a more effective job of understanding their data and communicating their findings should attend this class. Prior knowledge of R is not required, but if you are already working with R, you are welcome! Facility with programming and a knowledge of basic descriptive statistics will be assumed.

    Take back to work:

    This introduction to R and its ecosystem provides a walk along the R main line—coming up to speed on R, accessing data, analyzing data, and getting the message out. The key points include:

    • Acquaintance with R, R packages, and R Studio
    • Understanding where R fits into the system administrator’s tool set
    • Familiarity with basic R data manipulation techniques
    • Basic principles for ensuring reproducible and automated analyses 
    • Motivation to learn or improve your R skills
    • Next steps in mastering R
    Topics include:
    • Introduction to the R ecosystem (R, R Studio, CRAN)
    • Why should you consider R?
    • The R programming model: functions, tables, and packages 
    • The basic data analysis workflow
    • Reading and writing data from files and pipes
    • Data frames and data frame manipulations
    • Using the plyr and dplyr packages to slice and dice data
    • Using the ggplot2 package for graphing
    • Overview of the R package system
    • Other useful R packages
    Additional Materials:

    The following software packages should be installed in advance of the tutorial:

    R: Version 3 or later from CRAN or as supported by your OS. Some distributions have packages already available; others have both pre-compiled binaries and source code available from CRAN). R requires about 160MB (installed) on Mac OS X. Source code is available. License: GNU General Public License (GPL). 

    Optional: R Studio. Requires R, plus an additional 305MB on Mac OS X. Binary installations are available for Ubuntu, Fedora, Mac OS X, and Windows. Source code is available. R Studio does not appear to be supported for the BSD distributions. License: GNU Affero General Public License.

    Once you have installed R and (optional) R Studio, you can download contributed packages. Class demonstrations will use plyr and ggplot2. Other packages will be introduced as needed.

    Todd Palino, Staff Site Reliability Engineer, LinkedIn

    Todd is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping the largest deployment of Apache Kafka, Zookeeper, and Samza fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Previously, Todd was a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company.

    M4
    Apache Kafka from Scratch
    Todd Palino, Staff Site Reliability Engineer, LinkedIn

    Todd is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping the largest deployment of Apache Kafka, Zookeeper, and Samza fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Previously, Todd was a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company.

    9:00 am12:30 pm

    Thurgood Marshall East

    Apache Kafka is a publish/subscribe messaging system that is in use within companies including LinkedIn, Twitter, Netflix, and many others. It is used to build Extract, Transform, and Load (ETL) pipelines, collect metrics and logs, and queue data between applications, often providing the main backbone for moving data within big data infrastructures. This tutorial will focus on how to get started with Kafka, including working with ZooKeeper, which it depends on. We will cover installation, configuring retention and replication, and creating simple applications for producing and consuming messages.

    Who should attend:

    This tutorial is designed for engineers, both operations and development, who are new to Apache Kafka and publish/subscribe messaging. The only prerequisite knowledge for participating is the ability to install software and execute basic shell commands. The ability to write basic Python programs is helpful, but not required. Full working versions of all scripts used in the tutorial will be provided.

    Take back to work:

    Participants will leave the tutorial with an understanding of how to set up Apache ZooKeeper and Apache Kafka, and create message producers and consumers, having completed this work on their own systems. This will allow them to set up a publish/subscribe messaging infrastructure that can be used for myriad applications, including monitoring, logging, queuing, and tracking user-generated events.

    Topics include:

    ZooKeeper

    • What is ZooKeeper?
    • What is it NOT?
    • Standalone Setup

    Apache Kafka

    • Publish/Subscribe Messaging
    • Kafka Architecture
    • Installing Kafka

    Producing Messages

    • Message Schema
    • Using the Console Producer
    • Producing Inside Applications

    Consuming Messages

    • Using the Console Consumer
    • Consuming Inside Applications
    • Limitations of Non-Java Consumers

    Kafka Clusters

    • Adding a Second Broker (partner work)
    • Replicating Partitions
    • Creating Multiple Partitions

    Message Retention

    • Retention by Size
    • Retention by Key (log compacted)

    Use Cases

    • Monitoring
    • Log Collection
    • User-generated Events
    • Queuing
    Additional Materials:

    The exact requirements are to be determined. Each attendee will be required to bring a laptop, preferably Mac or Linux, on which they can install required software, including a Java development kit (if not already installed), Apache ZooKeeper, and Apache Kafka. More detailed instructions will be provided prior to the conference.

    Scott Cromar, Author, "From Techie to Boss"

    Scott Cromar is an experienced IT manager who still remembers what it was like to step into his first leadership position from a technical role. He has assembled diverse, multifunctional, globally distributed operational teams for several employers over his career, and he enjoys the challenge of creating a team from a group of talented individuals.

    M5
    Survival Guide for the New Manager
    Scott Cromar, Author, "From Techie to Boss"

    Scott Cromar is an experienced IT manager who still remembers what it was like to step into his first leadership position from a technical role. He has assembled diverse, multifunctional, globally distributed operational teams for several employers over his career, and he enjoys the challenge of creating a team from a group of talented individuals.

    9:00 am12:30 pm

    Lincoln 4

    You're respected by people inside and outside of your group. When there are hard problems to be resolved, you're the person who makes things work on time and under budget. More than once, you've pulled a complete miracle out of your hat. And when the team needs a new manager or team lead, you're the person at the top of the list.

    Unfortunately, the skills that make a good technical staff member do not always translate well to management. How do you make the leap? Scott’s presentation will help you prioritize and organize the tasks that every successful manager has to master.

    Who should attend:

    Technical people who are assuming (or are interested in assuming) a leadership role in their organizations.

    Take back to work:

    This class will share some of the tips and skills that I have learned the hard way, by transitioning from a solid performer to a team manager. You will become aware of the many resources available to help you make the transition.

    Topics include:
    • Defeating the "Peter Principle"
    • Becoming a good leader
    • The manager’s toolkit
    • Your 90-day transition plan 
    • Assessing team strengths, weaknesses, and needs
    • Building your team
    • Evaluating team members
    • Managing expectations and relationships
    • Creating your learning plan
    • Measuring success
    • Fitting the right strategy to your situation
    • Time management
    • Effective meetings
    • Managing projects effectively
    • Conflict management
    Additional Materials:

    This session includes a number of pen-and-paper exercises.

    Half Day Afternoon

    Matt Harrison, MetaSnake

    Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.

    Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

    M6
    Hands-on Introduction to Python Analytic Stack
    Matt Harrison, MetaSnake

    Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

    1:30 pm5:00 pm

    Thurgood Marshall East

    Python is in high demand. In addition to being used purely for development, Python programming is one of the top skills for data scientists because it is a full stack analytics package. You can access data with it (or crawl to gather data), slice it and dice it, throw it into a database, visualize it, and perform machine learning with it.

    This course will cover some of the tools that data scientists are using to analyze data. Specifically, we will introduce the IPython Notebook (Jupyter), the pandas toolkit, and the plotting facilities in matplotlib.

    Who should attend:

    Developers or admins who know Python or another language and want to learn about the analytic stack, specifically iPython Notebook, pandas, and Matplotlib.

    Take back to work:

    Attendees will return to work with a basic understanding of the Python tools for data analysis.

    Topics include:
    • Anaconda Distribution
    • IPython Notebook
    • Navigation in Notebook
    • Executing code in Notebook
    • pandas Introduction
    • Getting data
    • Cleaning data
    • Examining data
    • Filtering, joining and updating data
    • Working with aggregates
    • Creating pivot tables
    • Plotting Introduction
    • matplotlib architecture
    • Line plots
    • Histograms
    • Box Plots
    • Tweaking axis, labels, legends
    Additional Materials:

    Attendees should have the (free) Anaconda stack installed on their machine. This is a large download, so please do it before the class. Downloads for Windows, Mac, and Linux can be found at http://continuum.io/downloads

    Alison Chaiken, Mentor Graphics

    Alison Chaiken has been working with and contributing to systemd as part of her work on automotive Linux systems programming at Mentor Graphics' Embedded Software Division. She has given several presentations about systemd, most notably at the 2015 meetings of the Southern California Linux Expo and the Embedded Linux Conference.

    M7
    systemd, the Next-Generation Linux System Manager
    Alison Chaiken, Mentor Graphics

    Alison Chaiken has been working with and contributing to systemd as part of her work on automotive Linux systems programming at Mentor Graphics' Embedded Software Division. She has given several presentations about systemd, most notably at the 2015 meetings of the Southern California Linux Expo and the Embedded Linux Conference.

    1:30 pm5:00 pm

    Thurgood Marshall West

    Systemd is now at the heart of all major Linux distributions. The fundamental design principle is to pull duplicate daemon-management code out of SysVinit's bash scripts and migrate it to systemd's core or the kernel. systemd is modular, asynchronous and fully configurable via ASCII-formatted declarative config files.

    Attendees will learn to configure, diagnose and control systemd using its native tools like systemctl and journalctl. Hands-on examples will show systemd's unit files and directory hierarchy, illustrate socket-based activation, and demonstrate management of resources and security. Time permitting, attendees will learn about systemd and containers.

    Who should attend:

    Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.

    Take back to work:

    Familiarity with systemd's configuration, tools, and basic principles.

    Topics include:
    • The basic principles of systemd
    • systemd's major components
    • Anatomy of a systemd unit file
    • Understanding and optimizing the boot sequence
    • Improved system monitoring with journald
    • Resource management via systemd's cgroups interface 
    • Simple security management with systemd and the kernel's capabilities
    • systemd and virtualization
    Additional Materials:

    If network connectivity permits, we will try to offer virtual machine instances with systemd baked in. However, attendees should be able to work the examples on any fully patched, up-to-date Linux system that includes systemd and that can be rebooted. A Linux instance inside a VM will be most convenient in either case, but native installation should also work.

    Thomas A. Limoncelli, Stack Overflow

    Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

    He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

    M8
    Introduction to Time Management for Busy Devs and Ops
    Thomas A. Limoncelli, Stack Overflow

    Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

    He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

    1:30 pm5:00 pm

    Lincoln 4

    Devs and Ops have a time management problem: There are too many projects. Too many interruptions. Too many distractions. This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.

    Who should attend:

    Sysadmins, devs, operations, and their managers

    Take back to work:

    By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).

    Topics include:
    • Why typical “time management” strategies don’t work for sysadmins
    • What makes “to-do” lists fail, and how to make them work
    • How to eliminate “I forgot” from your vocabulary
    • How to manage interruptions: preventing them, managing the ones you get
    • Delegating to coworkers without them knowing
    • Achieving perfect follow-through
    • The Cycle System for recording and processing to-do lists
    • Prioritization techniques
    • Task grouping: batching, sharding, and multitasking
    • Handling situations like a big outage disrupting your perfectly planned day
    Additional Materials:
    Tuesday, November 10, 2015

    Full Day

    Theodore Ts'o, Google

    Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

    T1
    Linux Performance Tuning
    Theodore Ts'o, Google

    Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

    9:00 am5:00 pm

    Thurgood Marshall South

    The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workload on a Linux system.

    This class will cover the tools that can be used to monitor and analyze a Linux system as well as key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

    Who should attend:

    Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them

    Take back to work:

    The ability to hone your Linux systems for the specific tasks they need to perform.

    Topics include:
    • Strategies for performance tuning
    • Characterizing your workload's requirements
    • Finding bottlenecks
    • Tools for measuring system performance
    • Memory usage tuning
    • File system and storage tuning
    • Network tuning
    • Latency vs. throughput
    • Capacity planning
    • Profiling
    • Memory cache and 
    • Application tuning strategiesTLB tuning
    Additional Materials:

    Matt Harrison, MetaSnake

    Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.

    Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

    T2
    Hands-on Intermediate Python
    Matt Harrison, MetaSnake

    Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

    9:00 am5:00 pm

    Thurgood Marshall East

    Got the basics of Python down but want to dig in a little more?

    Have you wondered about functional programming, closures, decorators, context managers, generators, or list comprehensions, when you should use them, and how to test them? This hands-on tutorial will cover these intermediate subjects in detail. 

    Who should attend:

    Developers with a basic understanding of Python who want to learn more advanced features

    Take back to work:

    You will go back to work with a knowledge of advanced Python constructs and idioms, and know when to use them.

    Topics include:
    • Closures
    • Decorators
    • Class decorators
    • Properties
    • Context managers
    • List comprehensions
    • Iterator pattern
    • Generators
    Additional Materials:

    Please install Python (2 or 3) on your laptop. Most Linux and Mac computers already have Python installed. Windows users can find a download at python.org.

    Nick Feamster, Princeton University

    Nick Feamster is a professor in the Computer Science Department at Princeton University. Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

    T3
    Software Defined Networking: Principles and Practice
    Nick Feamster, Princeton University

    Nick Feamster is a professor in the Computer Science Department at Princeton University. Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

    9:00 am5:00 pm

    Thurgood Marshall North

    This course introduces software defined networking, an emerging paradigm in computer networking that allows a logically centralized software program to control the behavior of an entire network.

    Separating a network's control logic from the underlying physical routers and switches that forward traffic allows network operators to write high-level control programs that specify the behavior of an entire network. This is in contrast to conventional networks, where network operators must codify functionality in terms of low-level device configuration.

    Logically centralized network control makes it possible for operators to specify more complex tasks that involve integrating many disjoint network functions (e.g., security, resource control, prioritization) into a single control framework. This allows network operators to create more sophisticated policies and makes network configurations easier to configure, manage, troubleshoot, and debug. 

    Who should attend:

    Network operators, software developers, graduate students, and other professionals who want to gain hands-on experience with SDN.

    Take back to work:

    Knowledge about how to program modern SDN controllers for specific use cases, complete with examples. We'll provide a course virtual machine, so—obviously—that is for keeps!

    Topics include:
    • History and evolution of SDN
    • Control and data plane separation
    • Control Plane and Modern Controllers (Ryu, ONOS)
    • Network Virtualization (Docker)
    • Data Plane (P4)
    • Programming SDNs (Frenetic)
    • Verification and Debugging
    • Use Cases and Looking Forward (SDX, security, etc.)
    Additional Materials:

    Our Vagrant installation may be updated before October, but the current version is available at https://github.com/PrincetonUniversity/Coursera-SDN

    Half Day Morning

    Jerome Petazzoni, Docker Inc.

    Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

    T4
    Introduction to Docker and Containers
    Jerome Petazzoni, Docker Inc.

    Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

    9:00 am12:30 pm

    Thurgood Marshall West

    Docker is an open platform to build, ship, and run any Linux application, anywhere. It can be used in many ways: providing clean, isolated development environments; quickly spinning up test instances for CI purposes; ensuring coherence between development and production platform; and much more.

    This hands-on tutorial will give you an opportunity to dive in and see Docker in action. You will learn about Docker basic concepts, how to run containers, create your own images, interact with the "Docker Hub," and stack multiple containers to compose complex applications.

    Who should attend:

    This tutorial is relevant for both developers and sysadmins. If you have heard about Docker, but you haven't used it (or very little), this will get you started with a fast-paced, hands-on introduction.

    No previous knowledge of Docker or containers is required, but you will need some basic UNIX command-line skills.

    Take back to work:

    You will interact with a real Docker host and manipulate containers and images. After the tutorial, you will be able to use Docker in simple scenarios, and you will know where to look for further information to cover advanced use cases like orchestration, dynamic resource scheduling, load balancing, and more.

    Topics include:
    • What's Docker?
    • Running our first container
    • Understanding Docker images
    • Running a network service in a container
    • Building images interactively
    • Building images with Dockerfile
    • Pushing and pulling images
    • Development workflow with Docker
    • Network and volumes
    • Stacking complex apps with Compose
    Additional Materials:

    You won't need to pre-install Docker before the course; each student will be given credentials to connect to an individual virtual machine. All you need is a computer with a SSH client.

    Daniel Walsh, Red Hat, Inc.

    Daniel Walsh has worked in the computer security field for over 30 years. Dan joined Red Hat in August 2001. Dan leads the RHEL Docker enablement team since August 2013, but has been working on container technology for several years. He has led the SELinux project, concentrating on the application space and policy development. Dan helped developed sVirt, Secure Virtualization. He also created the SELinux Sandbox, the Xguest user and the Secure Kiosk. Previously, Dan worked Netect/Bindview's on Vulnerability Assessment Products and at Digital Equipment Corporation working on the Athena Project, AltaVista Firewall/Tunnel (VPN) Products. Dan has a BA in Mathematics from the College of the Holy Cross and a MS in Computer Science from Worcester Polytechnic Institute.

    T5
    Understanding SELinux
    Daniel Walsh, Red Hat, Inc.

    Daniel Walsh has worked in the computer security field for over 30 years. Dan joined Red Hat in August 2001. Dan leads the RHEL Docker enablement team since August 2013, but has been working on container technology for several years. He has led the SELinux project, concentrating on the application space and policy development. Dan helped developed sVirt, Secure Virtualization. He also created the SELinux Sandbox, the Xguest user and the Secure Kiosk. Previously, Dan worked Netect/Bindview's on Vulnerability Assessment Products and at Digital Equipment Corporation working on the Athena Project, AltaVista Firewall/Tunnel (VPN) Products. Dan has a BA in Mathematics from the College of the Holy Cross and a MS in Computer Science from Worcester Polytechnic Institute.

    9:00 am12:30 pm

    Lincoln 4

    The tutorial will cover basic SELinux concepts, but more from a usability point of view. What is SELinux trying to tell me? How does SELinux enforcement work? How does SELinux work in virtualization and containers? How can SELinux work in the enterprise?

    Who should attend:

    System administrators, security professionals, and developers.

    Take back to work:

    A greater understanding of SELinux; how to turn it on and leave it on. How to deal with SELinux issues.

    Topics include:
    • What is SELinux trying to tell me?
    • How does SELinux protect my virtualization and container systems?
    • How do I manage SELinux in the enterprise?
    Additional Materials:

    Attendees should have a VM that support SELinux, preferably Fedora 22 or Red Hat Enterprise Linux 7.

    Half Day Afternoon

    Jerome Petazzoni, Docker Inc.

    Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

    T6
    Advanced Docker Concepts and Container Orchestration
    Jerome Petazzoni, Docker Inc.

    Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

    1:30 pm5:00 pm

    Thurgood Marshall West

    You have installed Docker, you know how to run containers, and have written Dockerfiles to build container images for your applications (or parts of your applications). Now comes the next part: connecting multiple containers together and deploying them on a cluster of Docker hosts instead of your single development machine.

    This is what we will cover, along with some production-related questions: How to manage logs? Backups? Remote access? Security upgrades?

    In this workshop, we will present the networking model of containers and cover how to run containerized applications spanning multiple Docker hosts. We will give recipes to implement cleanly various ops tasks including logging, backups, and more. Finally, we will discuss orchestration and scheduling. We will present solutions like Swarm and Mesos, and see some of them in action on an actual microservices architecture.

    Who should attend:

    Developers and system administrators who are familiar with basic Docker concepts and want to learn how to deploy Docker for real-world applications.

    Take back to work:

    You will know how to deploy Docker "for real," and how to perform common operational tasks with containers. You will learn about patterns and best practices to scale applications with Docker and containers.

    Topics include:
    • Pre-requirements
    • VM environment
    • Our sample application
    • Running the whole app on a single node
    • Finding bottlenecks
    • Scaling workers on a single node
    • Scaling HTTP on a single node
    • Connecting to containers on other hosts
    • Abstracting connection details
    • Backups
    • Logs
    • Security upgrades
    • Network traffic analysis
    • Introducing Swarm
    • Setting up our Swarm cluster
    • Running on Swarm
    • Network plumbing on Swarm
    • Last words
    Additional Materials:

    You won't need to pre-install Docker before the course; each student will be given credentials to connect to an individual virtual machine. All you need is a computer with an SSH client.

    Thomas A. Limoncelli, Stack Overflow

    Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

    He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

    T7
    How to Not Get Paged: Managing On-call to Reduce Outages
    Thomas A. Limoncelli, Stack Overflow

    Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

    He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

    1:30 pm5:00 pm

    Lincoln 4

    People think of "on call” as responding to a pager that beeps because of an outage. In this class, you will learn how to run an on-call system that improves uptime and reduces how often you are paged. We will start with a monitoring philosophy that prevent outages. Then we will discuss how to construct an on-call schedule—possibly in more detail than you've cared about before—but, as a result, it will be more fair and less stressful. We'll discuss how to conduct “fire drills” and “game day exercises” that create antifragile systems. Lastly, we'll discuss how to conduct a postmortem exercise that promotes better communication and prevents future problems.

    Who should attend:

    Sysadmins, devs, operations, and their managers

    Take back to work:
    • Knowledge that makes being on call more fair and less stressful
    • Strategies for using monitoring to improve uptime and reliability
    • Team-training techniques such as "fire drills" and "game day exercises"
    • How to conduct better postmortems/learning retrospectives
    Topics include:
    • Why your monitoring strategy is broken and how to fix it
    • Building a more fair on-call schedule
    • Monitoring to detect outages vs. monitoring to improve reliability
    • Alert review strategies
    • Conducting “fire drills” and “game day exercises”
    • "Blameless postmortem documents"
    Additional Materials:
    Monday, February 22, 2016

    Half Day Morning

    Gary A. Grider, Los Alamos National Laboratory

    Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

    Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

    John Bent, EMC

    John Bent, currently of EMC, soon to be of Dell, formerly of Los Alamos National Lab, has been working on storage systems for over 20 years.  After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary designing, maintaining, and measuring some of the world's largest parallel storage systems.  Now at EMC, John works in the Office of the CTO helping design and map EMC storage products to emerging workloads in both Enterprise and Extreme IO.

    Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group.  John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.  

    Mark Gary, Lawrence Livermore National Laboratory

    Mark Gary is a Deputy Division Leader for the Livermore Computing Division within Computations. In this role, Mark has responsibilities for the 24x7 operation of LLNL's world-class computing environment. Livermore Computing provides reliable high performance computers, infrastructure and services (networks, data archive, operations, file systems, system software, visualization, system administration, user assistance and consultation) in support of LLNL missions. Mark leads projects ranging from leading integrated LC planning efforts, to external collaborations in support of extreme scale computing and storage futures.

    Mark has worked on all aspects of High Performance Computing at Livermore over the last 31 years. While the primary focus of his work has been on mass storage and parallel file systems, Mark has also worked on operating systems, driver, and kernel development. He is a co-author of HPSS and UniTree archival storage systems. Mark has co-managed successful government/industry collaborations over the last three decades and has led archival storage and Lustre file system development and operations teams.

    Mark received his B.S. in Computer Science from the University of California, Santa Barbara, in 1984.

    Nicholas Lewis, University of Minnesota and Los Alamos National Laboratory

    Nicholas Lewis is a Ph.D. candidate in the History of Science, Technology, and Medicine Program at the University of Minnesota, Twin Cities.  He received a master's in history from the University of Utah in 2011, and has undergraduate degrees in history and anthropology from Weber State University.  He worked in IT before joining the Charles Babbage Institute's NSF History of Computer Security Project as a graduate research assistant.  He currently works as a GSRA on the History of Supercomputing Project, a collaborative effort between CBI and the High-Performance Computing Division at Los Alamos National Laboratory, where he is currently conducting dissertation research.

    M1
    Innovations, Challenges, and Lessons Learned in HPC Storage Yesterday, Today, and Tomorrow
    Gary A. Grider, Los Alamos National Laboratory

    Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

    Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

     
    John Bent, EMC

    John Bent, currently of EMC, soon to be of Dell, formerly of Los Alamos National Lab, has been working on storage systems for over 20 years.  After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary designing, maintaining, and measuring some of the world's largest parallel storage systems.  Now at EMC, John works in the Office of the CTO helping design and map EMC storage products to emerging workloads in both Enterprise and Extreme IO.

    Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group.  John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.  

     
    Mark Gary, Lawrence Livermore National Laboratory

    Mark Gary is a Deputy Division Leader for the Livermore Computing Division within Computations. In this role, Mark has responsibilities for the 24x7 operation of LLNL's world-class computing environment. Livermore Computing provides reliable high performance computers, infrastructure and services (networks, data archive, operations, file systems, system software, visualization, system administration, user assistance and consultation) in support of LLNL missions. Mark leads projects ranging from leading integrated LC planning efforts, to external collaborations in support of extreme scale computing and storage futures.

    Mark has worked on all aspects of High Performance Computing at Livermore over the last 31 years. While the primary focus of his work has been on mass storage and parallel file systems, Mark has also worked on operating systems, driver, and kernel development. He is a co-author of HPSS and UniTree archival storage systems. Mark has co-managed successful government/industry collaborations over the last three decades and has led archival storage and Lustre file system development and operations teams.

    Mark received his B.S. in Computer Science from the University of California, Santa Barbara, in 1984.

     
    Nicholas Lewis, University of Minnesota and Los Alamos National Laboratory

    Nicholas Lewis is a Ph.D. candidate in the History of Science, Technology, and Medicine Program at the University of Minnesota, Twin Cities.  He received a master's in history from the University of Utah in 2011, and has undergraduate degrees in history and anthropology from Weber State University.  He worked in IT before joining the Charles Babbage Institute's NSF History of Computer Security Project as a graduate research assistant.  He currently works as a GSRA on the History of Supercomputing Project, a collaborative effort between CBI and the High-Performance Computing Division at Los Alamos National Laboratory, where he is currently conducting dissertation research.

    9:00 am12:30 pm

    In this tutorial, we will introduce the audience to the lunatic fringe of extreme high-performance computing and its storage systems. The most difficult challenge in HPC storage is caused by millions (soon to be billions) of simultaneously writing threads.  Although cloud providers handle workloads of comparable, or larger, aggregate scale, the HPC challenge is unique because the concurrent writers are modifying shared data. 

    We will begin with a brief history of HPC computing covering the previous few decades, bringing us into the petaflop era which started in 2009.  Then we will discuss the unique computational science in HPC so that the audience can understand the unavoidability of its unique storage challenges.  We will then move into a discussion of archival storage and the hardware and software technologies needed to store today’s exabytes of data forever.  From archive we will move into the parallel file systems of today and will end the lecture portion of the tutorial with a discussion of anticipated HPC storage systems of tomorrow.  Of particular focus will be namespaces handling concurrent modifications to billions of entries as this is what we believe will be the largest challenge in the exascale era. 

    The tutorial will end with a free-ranging audience directed panel.

    Topics include:
    • A brief history lesson about the past 30 years of supercomputers
    • An understanding of what makes HPC computing unique and the entailing storage challenges 
    • An overview of current HPC storage technologies such as burst buffers, parallel file systems, and archival storage
    • A glimpse into the future of HPC storage technologies for both hardware and software 
    • Insights into unique research opportunities to advance HPC storage
    Additional Materials:

    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    M2
    Understanding Large-Scale Storage Systems
    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    9:00 am12:30 pm

    This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

    Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

    The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

    Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Ceph, Lustre, GPFS, PanFS, HDFS (Hadoop File System), and OpenStack.

    Additional Materials:

    Half Day Afternoon

    Dean Hildebrand, IBM Research—Almaden

    Dean Hildebrand is a Research Staff Member and Master Inventor at the IBM Almaden Research Center and a recognized expert in scalable file systems and object stores. He has authored numerous scientific publications, created over 30 patents, and been the technical program chair and sat on the program committee of numerous conferences. Dr. Hildebrand pioneered pNFS, demonstrating the feasibility of providing standard and scalableaccess to any file system. He received a B.Sc. degree in computer science from the University of British Columbia in 1998 and M.S. and Ph.D. degreesin computer science from the University of Michigan in 2003 and 2007, respectively.

    Bill Owen, IBM

    Bill Owen is a Senior Engineer with the IBM Spectrum™ Scale development team. He is  responsible for the integration of OpenStack with Spectrum Scale, focusing on the Swift object, Cinder block, and Manila file storage components of OpenStack. He has worked in various development roles within IBM for over 15 years. Before joining IBM, Bill developed and deployed grid management systems for electric utilities. Bill holds B.Sc. and M.S. degrees in Electrical Engineering from New Mexico State University.

    M3
    An Overview of On-Premise File and Object Storage Access Protocols
    Dean Hildebrand, IBM Research—Almaden

    Dean Hildebrand is a Research Staff Member and Master Inventor at the IBM Almaden Research Center and a recognized expert in scalable file systems and object stores. He has authored numerous scientific publications, created over 30 patents, and been the technical program chair and sat on the program committee of numerous conferences. Dr. Hildebrand pioneered pNFS, demonstrating the feasibility of providing standard and scalableaccess to any file system. He received a B.Sc. degree in computer science from the University of British Columbia in 1998 and M.S. and Ph.D. degreesin computer science from the University of Michigan in 2003 and 2007, respectively.

     
    Bill Owen, IBM

    Bill Owen is a Senior Engineer with the IBM Spectrum™ Scale development team. He is  responsible for the integration of OpenStack with Spectrum Scale, focusing on the Swift object, Cinder block, and Manila file storage components of OpenStack. He has worked in various development roles within IBM for over 15 years. Before joining IBM, Bill developed and deployed grid management systems for electric utilities. Bill holds B.Sc. and M.S. degrees in Electrical Engineering from New Mexico State University.

    1:30 pm5:00 pm

    This tutorial will provide a technical overview of the latest distributed file and object access protocols. The goal is to provide administrators and developers with the knowledge to choose the best data access protocol for their new applications or determine if their existing file-based applications are good candidates for being ported to using an object access protocol.

    For decades, distributed file systems such as NFS have been the sole method for applications to work with remote data. The emergence of mobile devices, tablets, and the Internet of Things, combined with the global demand for cloud storage, has given rise to numerous new object storage access protocols. While these new protocols are simpler in many ways, and offer several new features, they also come with their own set of access semantics that may cause problems for applications.

    We will cover and contrast NFSv4/v4.1 with both the S3 and Swift object protocols, as well as discuss the challenges of providing both file and object access to a single dataset, including such topics as common identity, ACL, and quota management.

    Additional Materials:

    Jason Resch, Cleversafe Inc.

    Jason Resch has 17 years of professional software engineering experienceand is presently a Software Architect at Cleversafe, Inc.—a company that pioneered applying Erasure Codes to Object Storage. In his nine years at Cleversafe, Jason specialized in developing new algorithms to improve Erasure Code performance and security and techniques for rebuilding Erasure Coded data. He has 133 issued and 310 pending patents as well an numerous technical conference presentations and published journal papers. Jason graduated from Illinois Institute of Technology in 2006 with a B.S. in Computer Science with a specialization in information security and minor in psychology. He recently was awarded an IIT outstanding Young Alumnus Award and is listed in Crain's Chicago Business Tech 50 list (2015).

    W. David Schwaderer

    W. David Schwaderer presently consults for Silicon Valley enterprises, many of them specializing in data storage technologies. As a multidisciplinary technologist, he has authored 11 technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. David has presented at IEEE and USENIX conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across greater Silicon Valley. His four innovation Google TechTalks on YouTube have recorded over 40,400 views. David has a Masters Degree in Applied Mathematics from the California Institute of Technology and an MBA from the University of Southern California. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage forVideo-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

    M4
    Erasure Encoding—Practice and Principles
    Jason Resch, Cleversafe Inc.

    Jason Resch has 17 years of professional software engineering experienceand is presently a Software Architect at Cleversafe, Inc.—a company that pioneered applying Erasure Codes to Object Storage. In his nine years at Cleversafe, Jason specialized in developing new algorithms to improve Erasure Code performance and security and techniques for rebuilding Erasure Coded data. He has 133 issued and 310 pending patents as well an numerous technical conference presentations and published journal papers. Jason graduated from Illinois Institute of Technology in 2006 with a B.S. in Computer Science with a specialization in information security and minor in psychology. He recently was awarded an IIT outstanding Young Alumnus Award and is listed in Crain's Chicago Business Tech 50 list (2015).

     
    W. David Schwaderer

    W. David Schwaderer presently consults for Silicon Valley enterprises, many of them specializing in data storage technologies. As a multidisciplinary technologist, he has authored 11 technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. David has presented at IEEE and USENIX conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across greater Silicon Valley. His four innovation Google TechTalks on YouTube have recorded over 40,400 views. David has a Masters Degree in Applied Mathematics from the California Institute of Technology and an MBA from the University of Southern California. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage forVideo-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

    1:30 pm5:00 pm

    It's common knowledge that the volume of global data has exploded. Simultaneously, the challenge to store, protect, and access this data securely "at scale" has produced hyperscale hardware and software architectures that continue to subduct traditional enterprise datacenter systems. These new architectures will prove essential inresponding to the unrelenting global "data tsunami".

    One important hyperscale data storage methodology is Object Storage. Object Storage often uses Erasure Coding as a means to reduce data loss probabilities while simultaneously economizing data storage capital costs. Erasure Coding's powerful principles are also found in numerous other data retention methodologies, including Information Dispersal Algorithm (IDA) deployments and Secret Sharing, a method of providing shared-data security.

    Unfortunately, understanding Erasure Coding's deployment strategies and powerful foundations can quickly prove challenging, if not impossible, because Erasure Coding's simple principles are typically steeped in academic obfuscation. This has historically presented impenetrable obstacles to many engineers. Luckily, that's totally unnecessary.

    The first part of this tutorial will provide a brief Object Storage and Erasure Coding introduction as a backdrop for a deep exploration of effective Erasure Coding deployment strategies, including performance and bandwidth tradeoff considerations. It will also introduce IDA and Secret Sharing and briefly discuss their relation to Erasure Coding.

    After an intermission, the second part of the tutorial will provide a programming lab which exercises running Python 2.7 programs distributed on the FAST '16 Tutorial Sessions USB thumb drive. This lab should help cement Erasure Code principles and deployment considerations as well as provide demonstrations of their utility. As an example, the programs will illustrate Erasure Code operations using tables as well as on-the-fly calculations—useful in configurations where it is necessary to trade processing cycles for addressable memory.

    This tutorial portion will conclude with an intense, but an extremely accessible, Erasure Coding principles discussion that will be of interest for attendees desiring a deeper understanding of how Erasure Codes achieve their results. This material will be devoid of impenetrable mathematical jargon typically prevalent in Erasure Code literature. The discussion progressively examines various Galois Finite Fields in detail, with a brief discussion of GF(2^16).

    Finally, the tutorial will include discussion from the forthcoming book titled Exabyte Data Preservation, Postponing the Inevitable, co-authored by the speakers and Dr. Ethan Miller of University of California, Santa Cruz.

    Topics include:
    • Brief Object Storage Introduction
    • Erasure Coding and Object Storage
    • Erasure Coding Deployment Strategy and Tradeoff Considerations
    • Information Dispersal Algorithm and Secret Sharing
    • Understanding Galois Finite Fields
    • Galois Finite Field Computations (made extremely accessible)
    • Python 2.7 Galois Finite Field Computation Demonstration Programs
    • Python 2.7 programming lab
    Additional Materials:
    Sunday, December 4, 2016

    Full Day

    Theodore Ts'o, Google

    Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

    S1
    Linux Performance Tuning
    Theodore Ts'o, Google

    Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

    9:00 am5:00 pm

    Constitution Ballroom A

    The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workload on a Linux system.

    This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

    Who should attend:

    Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

    Take back to work:

    The ability to hone your Linux systems for the specific tasks they need to perform.

    Topics include:
    • Strategies for performance tuning
    • Characterizing your workload's requirements
    • Finding bottlenecks
    • Tools for measuring system performance
    • Memory usage tuning
    • Filesystem and storage tuning
    • Network tuning
    • Latency vs. throughput
    • Capacity planning
    • Profiling
    • Memory cache and TLB tuning
    • Application tuning strategies
    Additional Materials:

    Tyler Fitch, Chef

    Tyler is an Architect in Chef’s Customer Success program, championing successful patterns and delightful experiences in automation to enterprise customers. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

    S2
    Automation Tools Bootcamp
    Tyler Fitch, Chef

    Tyler is an Architect in Chef’s Customer Success program, championing successful patterns and delightful experiences in automation to enterprise customers. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

    9:00 am5:00 pm

    Constitution Ballroom B

    Overview
    The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new IT automation tools. We will learn about and then use Vagrant, Chef, Packer, Docker, Terraform and Artifactory to deploy a small application in local VMs.

    We will cover a progression of tasks, leveraging information from previous sections to deploy a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.

    Who should attend:

    Operations, QA, those who choose to call themselves DevOps, and even managers can come learn.

    Take back to work:

    These automation tools are freely available to engineers, enabling them to safely break local environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.

    Topics include:

    Vagrant, Chef, Packer, Docker, Terraform, and Artifactory

    Additional Materials:

    Salim Virji, Google

    Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and delightful ways to connect them all. He lives and works in New York City.

    S3
    SRE Classroom: Non-Abstract
    Salim Virji, Google

    Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and delightful ways to connect them all. He lives and works in New York City.

    9:00 am5:00 pm

    Commonwealth Ballroom

    With this hands-on tutorial, you will develop an understanding for designing, building, and running reliable Internet services at a large scale.

    Who should attend:

    This tutorial is suitable for executives who need to specify and evaluate systems, engineers who build systems, and IT professionals who want to run first-class services built with reliable systems.

    Take back to work:

    You will take back an understanding of how to evaluate system designs, how to specify and build large systems, and how to operate these systems in the real world in a way that will scale as the system grows.

    Topics include:
    • Designing Reliable Systems
    • Building Reliable Systems
    • Running Reliable Systems
    Additional Materials:

    Half Day Morning

    Lee Damon, University of Washington

    Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

    S4
    Professional Conduct and Ethics for System Administrators
    Lee Damon, University of Washington

    Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

    9:00 am12:30 pm

    Gardner Room

    This introductory tutorial will start by examining some of the ethical responsibilities that come along with access to other users' data, accounts, and confidential information. We will look at several case studies involving both local and cloud usage. All attendees are strongly encouraged to participate in the discussion. Numerous viewpoints will be considered in order to give students a perspective from which to develop their own reasoned response to ethical challenges.

    Who should attend:

    Anyone who is a system administrator or has access to personal/confidential information, or anyone who manages system administrators or makes policy decisions about computer systems and their users. There are no prerequisites for this class.

    Take back to work:

    After completing this tutorial you will be better prepared and able to resolve ethically questionable situations and will have the means to support your decisions.

    Topics include:
    • Why it is important to set your ethical standards before it comes up
    • Who is impacted by "expectations of ethical conduct"
    • Why this isn't just an expectation of system administrators
    • Implicit expectations of ethical behavior
    • Ethics and The Cloud
    • Coercion to violate ethics
    • Well-intentioned violations of privacy
    • Collection, retention, and protection of personal data
    • Management directives vs. friendships
    • Software piracy/copying in a company, group, or department
    Additional Materials:

    Kyrre Begnum, Oslo University College of Applied Sciences

    Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BSc levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

    Nicole Forsgren, DORA

    Dr. Nicole Forsgren is an IT impacts expert who shows leaders and practitioners how to unlock the potential of technology change in their organizations. Best known for her work with tech professionals and as the lead investigator on the State of DevOps Reports, she is CEO and Chief Scientist at DORA (DevOps Research and Assessment) and an Academic Partner at Clemson University. In a previous life, she was a professor, sysadmin, and hardware performance analyst.

    S6
    Statistics for Operations: Making Sense out of Data
    Kyrre Begnum, Oslo University College of Applied Sciences

    Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BSc levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

     
    Nicole Forsgren, DORA

    Dr. Nicole Forsgren is an IT impacts expert who shows leaders and practitioners how to unlock the potential of technology change in their organizations. Best known for her work with tech professionals and as the lead investigator on the State of DevOps Reports, she is CEO and Chief Scientist at DORA (DevOps Research and Assessment) and an Academic Partner at Clemson University. In a previous life, she was a professor, sysadmin, and hardware performance analyst.

    9:00 am12:30 pm

    Fairfax Room

    This tutorial is a course in statistics with a specific focus on system administrators and the types of data they face. We assume little prior knowledge of statistics and cover the most common concepts in descriptive statistics and apply them to data taken from real life examples. Our aim is to provide insight into what methods provide good interpretation of data such as distributions, probability and formulating basic statements about the properties of observed data.

    The first part will cover descriptive statistics for single datasets, including mean, median, mode, range and distributions. When discussing distributions, we will cover probabilities through percentiles (e.g., a normal distribution is very uncommon in ops data). This session will use a prepared dataset and spreadsheet (LibreOffice or OpenOffice, because it works on all platforms). We have data from the number of players from an online game over a 6-month period. In this exercise, we will analyze the distribution and try to make statements like, “What is the likelihood that we see more than 27,000 simultaneous players?” One of the lessons is that the top 5% in the distribution counts for almost a doubling in players, which is interesting. We then extend the discussion to include organizational implications: Imagine if your job is to buy resources for a service like this, and you have to double your rig in order to cope with something that is only 5% likely to happen? How would you explain it in a meeting?

    The second part will discuss comparisons using two common methods that can be calculated in a spreadsheet: correlations and regressions. Correlations will be used as a tool to identify interesting relationships among data; ranked correlation may be considered for two data sets that have the same «flow» but on separate ranges (e.g., the correlation between web requests and database requests). Regression can also be used to identify relationships. For example, using a regression plot between two variables, one could identify bottlenecks by comparing the load of two tiers (db tier vs web tier). In a scalable system, we would expect a nice 45-degree linear relationship between the two. However, if the database tier struggles before the web tier, we would see the linear approximation slope «upward» (if the db load is on the y axis) as the load increases.

    Throughout we will have a focus on takeaways and trying to couple the different statistical methods with the type of answers they can provide, like: “Can the average of a dataset explain the outer limits of my data?”. It is easy to fall off the wagon with a topic like statistics. We are aware of this risk and will utilize active learning techniques such as socrative and kahoot to engage the audience and make them participate more.

    Who should attend:

    Sysadmins who are faced with data overload and wish they had some knowledge of how statistics can be used to make more sense of it. We assume little prior knowledge of statistics, but a basic mathematical proficiency is recommended.

    Take back to work:
    • A fundamental understanding of how descriptive statistics can help provide additional insight on the data in the sysadmin world and that will allow for further self-study on statistics.
    • A basic set of statistical approaches that can be used to identify fundamental properties of the data they see in their own environments, and identify patterns in that data.
    • Learn how to make accurate and clear statements about their metrics that are valuable to the organization.
    Topics include:
    • Descriptive statistics for single datasets, including: mean, median, mode, range, and distributions
    • Basic analysis of distributions and probabilities using percentiles typically seen in ops
    • Interpretation of analyses to include team and business implications
    • Regression analysis to suggest predictive relationships, with an emphasis on interpretation and implications
    • Correlation analysis and broad pattern detection (if time allows)
    Additional Materials:

    Half Day Afternoon

    Thomas Limoncelli, StackOverflow.com

    Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

    S5
    How to Not Get Paged: Managing On-Call to Reduce Outages
    Thomas Limoncelli, StackOverflow.com

    Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

    1:30 pm5:00 pm

    Fairfax Room

    People think of "on call” as responding to a pager that beeps because of an outage. In this class, you will learn how to run an on-call system that improves uptime and reduces how often you are paged. We will start with a monitoring philosophy that prevent outages. Then we will discuss how to construct an on-call schedule—possibly in more detail than you've cared about before—but, as a result, it will be more fair and less stressful. We'll discuss how to conduct “fire drills” and “game day exercises” that create antifragile systems. Lastly, we'll discuss how to conduct a postmortem exercise that promotes better communication and prevents future problems.

    Who should attend:

    Managers or Sysadmins with oncall responsibility

    Take back to work:
    • Knowledge that makes being on call more fair and less stressful
    • Strategies for using monitoring to improve uptime and reliability
    • Team-training techniques such as "fire drills" and "game day exercises"
    • How to conduct better postmortems/learning retrospectives
    Topics include:
    • Why your monitoring strategy is broken and how to fix it
    • Building a more fair on-call schedule
    • Monitoring to detect outages vs. monitoring to improve reliability
    • Alert review strategies
    • Conducting “fire drills” and “game day exercises”
    • "Blameless postmortem documents"
    Additional Materials:
    S7
    Documentation Techniques for System Administrators
    1:30 pm5:00 pm

    Gardner Room

    Sysadmins freely acknowledge how important documentation is to their daily lives, and in the same sentence will loudly complain that they don’t have time to produce documentation. This class is about how to produce effective, useful and timely documentation as part of your normal sysadmin activities. Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

    Who should attend:

    System administrators of all types and levels who need to produce documentation for the systems they manage, or who want to improve their documentation skills. Documentation can be the difference that turns you from a good sysadmin to a great sysadmin!

    Take back to work:
    • The skills to improve personal and team documentation quality
    • A solid understanding of how to establish and maintain effective documentation practices
    Topics include:
    • Why system administrators need to document
    • Documentation as part of your daily workflow
    • Targeting your audience
    • Common mistakes made in documentation
    • Tools to assist the documentation process (including effective use of wikis)
    Additional Materials:
    Monday, December 5, 2016

    Half Day Morning

    Courtney Eckhardt, Heroku

    Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don Norman into a wide-reaching and humane concept of operational reliability.

    Lex Neva, Heroku

    Lex Neva is probably not a super-villain. He has six years of experience keeping large services running, including Linden Lab's Second Life, DeviantArt.com, and his current position as a Heroku SRE. While originally trained in computer science, he’s found that he most enjoys applying his software engineering skills to operations. A veteran of many large incidents, he has strong opinions on incident response, on-call sustainability, and reliable infrastructure design, and he currently runs SRE Weekly (sreweekly.com).

    M1
    "I Never Want to Live through This Again!": Running Excellent Retrospectives
    Courtney Eckhardt, Heroku

    Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don Norman into a wide-reaching and humane concept of operational reliability.

     
    Lex Neva, Heroku

    Lex Neva is probably not a super-villain. He has six years of experience keeping large services running, including Linden Lab's Second Life, DeviantArt.com, and his current position as a Heroku SRE. While originally trained in computer science, he’s found that he most enjoys applying his software engineering skills to operations. A veteran of many large incidents, he has strong opinions on incident response, on-call sustainability, and reliable infrastructure design, and he currently runs SRE Weekly (sreweekly.com).

    9:00 am12:30 pm

    Fairfax Room

    Your site’s back up, you’re back in business. Do you have a way to make sure that problem doesn’t happen again? And if you do, do you like how it works?

    Heroku uses a blameless retrospective process to understand and learn from our operational incidents. We’ve recently released the templates and documentation we use in this process, but experience has taught us that facilitating a retrospective is a skill that’s best taught person to person.

    This tutorial will take you through a retrospective based on the internal and external communications of a real Heroku operational incident. We’ve designed it to help you experience first-hand the relaxed, collaborative space that we achieve in our best retrospectives. We’ll practice tactics like active listening, redirecting blame, and reframing conversations. Along the way, we’ll discuss how we developed this process, what issues we were trying to solve, and how we’re still iterating on it.

    Who should attend:

    Managers, tech leads, and anyone interested in retrospective culture and iterating on processes.

    Take back to work:

    Attendees will have the materials and first-hand experience to advocate for (or to begin) an incident retrospective process at their workplace, or to improve a process they might already be using.

    Topics include:
    • Why run a retrospective
    • Goal of a retrospective
    • Blameless retrospectives
    • Facilitating: redirecting blame, reframing, drawing people out
    • How to structure a retrospective
    • Preparing for a retrospective
    • Five “why”s/infinite “how”s
    • Human error
    • Achieving follow-through on remediation items
    Additional Materials:

    Ben Breard, Red Hat

    Ben Breard is the Technology Product Manager for Linux Containers at Red Hat where he focuses on driving the container roadmap, RHEL Atomic Host, and evangelizing open source technology in his free time. Previously he was a Solutions Architect and and worked closely with key customers around cloud/systems management, virtualization, and all things RHEL. Ben joined Red Hat in 2010 and currently works out of Dallas, Texas.

    M4
    Demystifying Systemd
    Ben Breard, Red Hat

    Ben Breard is the Technology Product Manager for Linux Containers at Red Hat where he focuses on driving the container roadmap, RHEL Atomic Host, and evangelizing open source technology in his free time. Previously he was a Solutions Architect and and worked closely with key customers around cloud/systems management, virtualization, and all things RHEL. Ben joined Red Hat in 2010 and currently works out of Dallas, Texas.

    9:00 am12:30 pm

    Constitution Ballroom B

    It's 2016 and at this point why would anyone care about an init system? Well, apparently not only is process management essential to the operating system, all the hype around things like containers and resource management are making this topic sexy. This session will be a hands-on, interactive look at the architecture, capabilities, and administrative how-tos of systemd. Anyone who's new to systemd or looking to dig deeper into some of the advanced features should attend. Please bring a laptop with a virtual machine running a distribution of your choice that uses systemd.

    Who should attend:

    Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.

    Take back to work:

    Understanding of how systemd works, where to find the configuration files, and how to maintain them.

    Topics include:
    • The basic principles of systemd
    • systemd's major components
    • Anatomy of a systemd unit file
    • Understanding and optimizing the boot sequence
    • Improved system logging with the journal
    • Resource management via systemd's cgroups interface
    • Simple security management with systemd and the kernel's capabilities
    • systemd, containers, and virtualization
    Additional Materials:

    John H. Nyhuis

    John H. Nyhuis is an Infrastructure Engineer, serving as IT Director at the Altius Institute for Biomedical Sciences. He is the culmination of 20 years of experience in Infrastructure Engineering and IT Management within industry, academic, and medical environments, including extensive experience with scalable system architecture, implementation, optimization, and deployment:

    • Leadership: Experienced building consensus in diverse highly environments. Project Management (Scrum and LEAN), Risk Management, IT audits and remediation, HIPAA, FIRPA
    • Management: Expense Controls, Budgeting, Employees Management, Project Proposals, Process Improvement
    • Vendor Relations: Contract Negotiation, Fundraising / Equipment Donations
    • Architecture/Design: Cloud Computing, Virtualization, Automation, Scalability, Root Cause Analysis
    • Deployments: Massively Parallel Implementations, Global Deployments, Code Management, Release Testing

    In his free time, John serves as an Economic Development Commissioner for the City of Lake Forest Park, in the great state of Washington.

    Lee Damon, University of Washington

    Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

    M6
    Advanced Communication: Practical Tactics and Strategy
    John H. Nyhuis

    John H. Nyhuis is an Infrastructure Engineer, serving as IT Director at the Altius Institute for Biomedical Sciences. He is the culmination of 20 years of experience in Infrastructure Engineering and IT Management within industry, academic, and medical environments, including extensive experience with scalable system architecture, implementation, optimization, and deployment:

    Leadership: Experienced building consensus in diverse highly environments. Project Management (Scrum and LEAN), Risk Management, IT audits and remediation, HIPAA, FIRPA Management: Expense Controls, Budgeting, Employees Management, Project Proposals, Process Improvement Vendor Relations: Contract Negotiation, Fundraising / Equipment Donations Architecture/Design: Cloud Computing, Virtualization, Automation, Scalability, Root Cause Analysis Deployments: Massively Parallel Implementations, Global Deployments, Code Management, Release Testing

    In his free time, John serves as an Economic Development Commissioner for the City of Lake Forest Park, in the great state of Washington.

     
    Lee Damon, University of Washington

    Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

    9:00 am12:30 pm

    Gardner Room

    Systems Administrators are expected to be intelligent, dedicated, and professional experts in our field. Yet when compared to other professions of similar education, we often do not receive credit for our efforts and receive less respect from our fellow workers.

    This problem doesn’t just affect our personal well-being; businesses make poorer decisions when input from technical people is disregarded or overlooked. As professionals, we are all expected to step up and defend ourselves, our teams, and our projects. Being able to communicate meaningfully and accurately is critical to our success.

    This tutorial will provide practical techniques for both in-person and written interpersonal challenges. Difficult conversations are a part of life as well as business and we need to develop the tools for dealing with them. We will review materials from several sources including our own experiences and will have practical exercises to work through to give attendees a strong starting point for their own difficult communication challenges.

    Who should attend:

    IT Professionals and anyone who must deal with difficult people under stressful conditions.

    Take back to work:
    • How to deal effectively with verbal and written conflict
    • How to identify and stop verbal and written abuse
    • How to maximize your chances to succeed in difficult conversations
    Topics include:
    • E.I.Q. and how to use it
    • Satir Modes of Conversation
    • Verbal Jujitsu
    • Lifescripts
    Additional Materials:

    Full Day

    Caskey L. Dickson, Microsoft Corporation

    Caskey L. Dickson is a Site Reliability Engineer at Microsoft where he is part of the leadership team reinventing operations at Azure. Before that he was at Google where he worked as an SRE/SWE, writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. He has worked in online services since 1995 when he turned up his first web server and has been online ever since. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

    M3
    Everything You Wanted to Know about Operating Systems but Were Afraid to Ask
    Caskey L. Dickson, Microsoft Corporation

    Caskey L. Dickson is a Site Reliability Engineer at Microsoft where he is part of the leadership team reinventing operations at Azure. Before that he was at Google where he worked as an SRE/SWE, writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. He has worked in online services since 1995 when he turned up his first web server and has been online ever since. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

    9:00 am5:00 pm

    Constitution Ballroom B

    Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "it's sometimes slow."

    This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis.

    You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms, and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.

    Buckle up.

    Who should attend:
    • All admins who did not take the Comp-Sci academic route and never had a course in OS internals
    • Inexperienced admins whose coursework or training didn't include the kind of OS internals that they should (modern OS courses have become a shadow of their former selves and commonly require writing no OS code)
    • More experienced admins who haven't really had to address these sorts of issues on a regular basis who probably know a lot about some individual aspects but could benefit from everything being put into a broader context
    Take back to work:

    Attendees will gain a deeper understanding of what goes on inside the kernel and the areas where things can go wrong. We'll explore how little the concept of "system load" captures about the true system state, and attendees will be prepared to improve both their operational response methodologies as well as their monitoring goals.

    Topics include:

    Morning:

    • Scheduling and Process Management
    • Memory Management and the MMU
    • Virtualization and its impact on these

    Afternoon:

    • File System Architecture (for sysadmins covering ext (2, 3, 4), NTFS, and ZFS)
    • Storage layer performance, disks, RAID, and SANs
    • The impact of virtualization on these
    Additional Materials:

    Nick Feamster, Princeton University

    Nick Feamster is a professor in the Computer Science Department at Princeton University and the Acting Director of the Princeton University Center for Information Technology Policy (CITP). Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

    M8
    Software Defined Networking for Network Administrators
    Nick Feamster, Princeton University

    Nick Feamster is a professor in the Computer Science Department at Princeton University and the Acting Director of the Princeton University Center for Information Technology Policy (CITP). Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

    9:00 am5:00 pm

    9:30 am–12:30 pm: Commonwealth Ballroom
    1:30 pm–5:00 pm: Back Bay Ballroom D (LISA Lab)

    The tutorial will cover topics in Software Defined Networking (SDN) in a presentation format that is oriented towards network and system administrators. SDN separates the network's control plane (the software that controls how networks from its data plane (the routers and switches in the network that forward packets).

    This course will cover the aspects of Software Defined Networking that relate most closely to network operations. We will divide the course into four parts:

    1. Overview and motivation of SDN
    2. Commercial operational SDN controllers (Ryu, ODL) and switch capabilities
    3. Network virtualization technologies
    4. Network operations use cases (including SDN for the wide area, data centers, home networks, and wireless)

    The after lunch portion of this class will be held in the LISA Lab.

    Who should attend:

    Beginner and Intermediate Virtual Infrastructure Administrators

    Take back to work:

    Attendees will take back knowledge about SDN that will help them evaluate whether it is an appropriate technology to apply in their own networks.

    Attendees will better understand what SDN is, the types of problems that it can (and cannot) solve, the capabilities of current software controller platforms, and the capabilities (and shortcomings) of existing hardware switches.

    The course will also include "war stories" from successful (and stunted) SDN deployments that will help attendees better evaluate the suitability of SDN for solving their own network management problems.

    Topics include:
    1. Overview and motivation of SDN
    2. Commercial operational SDN controllers (Ryu, ODL) and switch capabilities
      • Ryu
      • Open Daylight
      • An overview of hardware switch capabilities
    3. Network virtualization technologies
      • Nicira NSX
      • FlowVisor
    4. Network operations use cases (including SDN for the wide area, data centers, home networks, and wireless)
      • SDX: Software Defined Internet Exchange Points
      • SD-WAN: SDN in Wide Area Networks
      • SDN in home networks
      • SDN in data centers
    Additional Materials:

    Half Day Afternoon

    Robert A. Ballance, Ph.D.

    Dr. Robert Ballance honed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the Long Now Foundation. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989. He is currently serving as a White House Presidential Innovation Fellow.

    M2
    An Introduction to R for System Administrators
    Robert A. Ballance, Ph.D.

    Dr. Robert Ballance honed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the Long Now Foundation. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989. He is currently serving as a White House Presidential Innovation Fellow.

    1:30 pm5:00 pm

    Commonwealth Room

    Data analysis is not just about discovery, it’s about communication. Good communication tells stories. Savvy system administrators provide their management with the background needed to maintain operations, manage budgets, support users, and provide their coworkers with the insights needed to keep their systems solid.

    The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for automating the process with reproducible and repeatable results. This brief introduction to R and its ecosystem will provide a walk along the mainline—coming up to speed on R, accessing data, analyzing data, and getting the message out.

    This tutorial is designed to:

    • motivate you to pick up R
    • demonstrate useful techniques using R
    • illustrate ways to simplify your life by automating data analysis and reporting

    In-class demonstrations will be augmented with hands-on opportunities during the workshop. Additional exercises and data sets that students can explore following the workshop will be provided. If you plan on working on the exercises, install R and (optionally) R Studio.

    Who should attend:

    System administrators who are awash in operational data and want to do a more efficient job of understanding their data and communicating their findings. Facility with programming and knowledge of basic descriptive statistics is assumed. Prior knowledge of R is not required.

    Take back to work:
    • Acquaintance with R, R packages, and R Studio
    • Understanding where R fits into the system administrator’s tool set
    • Familiarity with basic R data-manipulation techniques
    • Motivation to learn or improve your R skills
    • Next steps in learning and mastering R
    Topics include:
    • Introduction to the R ecosystem
    • R as a language
    • Basic programming in R
    • The data analysis workflow
    • Reading and writing data from files and pipes
    • Data frames and data frame manipulations
    • Exploratory analysis
    • Using the ggplot2 package for graphing
    • Other useful R packages. 

    Examples will be based on situations encountered during routine system operations.

    Additional Materials:

    Thomas Limoncelli, StackOverflow.com

    Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

    M7
    Personal Time Management: The Basics for Sysadmins That Are Overloaded
    Thomas Limoncelli, StackOverflow.com

    Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

    1:30 pm5:00 pm

    Fairfax Room

    Whether you are a sysadmin, dev, or web ops, time management can be more difficult than any technology issue. This class is for new and junior system admins that have found themselves over their head, overloaded, and looking for a better way to survive the tech world.

    This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.

    Who should attend:

    Sysadmins, devs, operations, and their managers

    Take back to work:

    By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).

    Topics include:
    • Why typical “time management” strategies don’t work for sysadmins
    • What makes “to-do” lists fail, and how to make them work
    • How to eliminate “I forgot” from your vocabulary
    • How to manage interruptions: preventing them, managing the ones you get
    • Delegating to coworkers without them knowing
    • Achieving perfect follow-through
    • The Cycle System for recording and processing to-do lists
    • Prioritization techniques
    • Task grouping: batching, sharding, and multitasking
    • Handling situations like a big outage disrupting your perfectly planned day
    Additional Materials:

    Deji Akomolafe, Microsoft Applications Virtualization Lead, VMware

    Deji Akomolafe (a CTO Ambassador and Staff Solutions Architect within VMware's Global Field and Partner Readiness Group) specializes in the virtualization of Microsoft Business Critical Applications on the VMware's vSphere platform. Deji is a regular speaker at many industry-leading technical conferences and workshops (including VMworld, SQL Saturday, EMCWorld, and Partners Exchange), presenting technical subject matters related to virtualization and providing technical guidance to help clients enhance their expertise and ability to optimally virtualize and operate their critical applications.

    Cody Chapman, Heraflux Technologies

    Cody Chapman is a Solutions Architect with Heraflux Technologies. His areas of expertise are virtualization, cloud, storage, performance, datacenter architecture, risk mitigation through high availability and disaster recovery, and performing technical exorcisms. He has worked on systems large and small in a wide variety of industries. He is actively working to automate every facet of datacenter and database management. You can read his blog at heraflux.com, and reach him on Twitter at @codyrchapman.

    David Klee, Heraflux Technologies

    David Klee is a Microsoft MVP and VMware vExpert with over seventeen years of IT experience. David is the Founder of Heraflux Technologies, a consultancy focused on data virtualization and performance tuning, datacenter architecture, and business process improvements. You can read his blog at davidklee.net and reach him on Twitter at @kleegeek.

    M5
    Designing Your VMware Virtual Infrastructure for Optimal Performance, Resilience, and Availability—Straight from the Source
    Deji Akomolafe, Microsoft Applications Virtualization Lead, VMware

    Deji Akomolafe (a CTO Ambassador and Staff Solutions Architect within VMware's Global Field and Partner Readiness Group) specializes in the virtualization of Microsoft Business Critical Applications on the VMware's vSphere platform. Deji is a regular speaker at many industry-leading technical conferences and workshops (including VMworld, SQL Saturday, EMCWorld, and Partners Exchange), presenting technical subject matters related to virtualization and providing technical guidance to help clients enhance their expertise and ability to optimally virtualize and operate their critical applications.

     
    Cody Chapman, Heraflux Technologies

    Cody Chapman is a Solutions Architect with Heraflux Technologies. His areas of expertise are virtualization, cloud, storage, performance, datacenter architecture, risk mitigation through high availability and disaster recovery, and performing technical exorcisms. He has worked on systems large and small in a wide variety of industries. He is actively working to automate every facet of datacenter and database management. You can read his blog at heraflux.com, and reach him on Twitter at @codyrchapman.

     
    David Klee, Heraflux Technologies

    David Klee is a Microsoft MVP and VMware vExpert with over seventeen years of IT experience. David is the Founder of Heraflux Technologies, a consultancy focused on data virtualization and performance tuning, datacenter architecture, and business process improvements. You can read his blog at davidklee.net and reach him on Twitter at @kleegeek.

    1:30 pm5:00 pm

    Constitution Ballroom A

    Downloads:
    M5 Slides (PDF) | M5 Slides (PowerPoint)

    Not very long ago, the very idea of virtualization production mission-supporting enterprise applications can be said to be so career-threatening that only the brave dare entertain it for longer than a few seconds. Fast forward to now, virtualization is so pervasive and well-accepted that the inverse is true—"Virtualize First" is now a standard corporate mandate in large enterprises, and no modern commercial application is exempted.

    Sadly, embracing virtualization has turned out not be the panacea for everything that ails an enterprise. In fact, virtualization often contributes to sub-optimal performance, availability, recoverability, and agility of many applications in the enterprise—with lots of frustrations, heartburn, reduced productivity, and yes, interrupted personal lives. In a rush to be a part of the "cool crowd," many enterprises fail to identify and account for the intricacies and requirements of the virtualization platform, relegating such considerations to the secondary or tertiary tiers on the "Due Diligence" scale.

    If you have have adopted virtualization as a platform for your mission-critical applications, or if you are in the process of doing so, please be sure to attend this tutorial. The tutorial will provide a comprehensive and detailed knowledge transfer that enables you to avoid the common pitfalls encountered in a VMware vSphere virtualization project infrastructure. We will discuss and explain the considerations for successfully running your mission-critical applications in a vSphere-based infrastructure without loss of performance, availability, recovery, and resilience. The tutorial will go beyond the standard slide-ware and present an actual demonstration of the effects of certain configuration optimization strategies on the overall condition of the virtualized applications and the virtual infrastructure as a whole.

    Who should attend:
    • Infrastructure, Solution and Enterprise Architects
    • Virtual Infrastructure and Applications Administrators
    • Network Administrators
    • IT Operators
    Take back to work:

    The tutorial will be interactive, encouraging questions from participants—so please come in with your own unique and specific questions. The tutorial will provide you with tips and tricks drawn directly from the most current VMware guidance, recommendations, and knowledge-based references, as well as from real-life customer situations.

    Topics include:
    • Virtualization concepts
    • Virtualization stack
    • Hardware abstraction and the relationship and inter-dependencies between the physical and virtual components
    • Pooling and sharing resources in a virtual environment
    • Common assumptions that lead to performance degradation for virtualized applications
    • Configuration optimization that enhances performance
    • Availability and resilience within a VMware vSphere virtual infrastructure
    Additional Materials:
    Tuesday, December 6, 2016

    Full Day

    Jérôme Petazzoni, Docker Inc.

    Jerome works at Docker, where he helps others to containerize all the things. He was part of the team that built, scaled, and operated the dotCloud PAAS, before it became Docker. When annoyed he threatens to replace things with a very small shell script.

    T1
    Deploying and Scaling Applications with Docker Swarm
    Jérôme Petazzoni, Docker Inc.

    Jerome works at Docker, where he helps others to containerize all the things. He was part of the team that built, scaled, and operated the dotCloud PAAS, before it became Docker. When annoyed he threatens to replace things with a very small shell script.

     
    9:00 am5:00 pm

    Constitution Ballroom A

    Docker is an open platform to build, ship, and run any application, anywhere. In this hands-on tutorial, you will learn advanced Docker concepts, and see how to deploy and scale applications using Docker Swarm clustering abilities and other open source tools of the Docker ecosystem.

    This tutorial is living material: it is delivered at least once a month in public sessions all around the U.S. and Europe. Since the Docker platform in general, and Docker Swarm in particular, evolve rapidly, this tutorial evolves as well, following closely the releases of the various components of the Docker ecosystem: Engine, Compose, Swarm, Machine.

    Who should attend:

    Docker users who want production-grade container deployments.

    You should be familiar with Docker and basic Docker commands (docker run, docker ps, and docker stop) as well as the Dockerfile syntax (at least RUN, CMD, and EXPOSE commands). Ideally, you should have experimented with Compose. If you have limited Docker knowledge but consider yourself a quick learner, don't hesitate to attend: there will be numerous examples and demos, and you will be able to test them out on your own Docker cluster!

    Take back to work:

    After this tutorial, you will know how to deploy applications to production with Docker and containers. We will tackle lots of frequently asked questions in the Docker ecosystem: how to manage the lifecycle of container images, how to implement service discovery across Docker clusters, how to load balance traffic on scaled applications, how to perform security upgrades, and more.

    Topics include:

    Containers, Docker, Orchestration, Scheduling, and Service Discovery

    Additional Materials:

    Branson Matheson, Cisco Systems, Inc.

    Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

    T2
    Applied Security Tools for Sysadmins
    Branson Matheson, Cisco Systems, Inc.

    Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

    9:00 am5:00 pm

    Constitution Ballroom B

    The course is a direct response to the many requests I have gotten for “more tools”, and so I have written it with an eye to meeting that goal. The class will be taught through a “secure and defend” plan where we will take the majority of class to learn offensive and defensive tools and break into teams and work to secure and setup monitoring for provided on-site test environments. The second phase of our class will involve students coming to LISALabs to use the attack tools and defend their environments from their peers. There would be scheduled times for either the teams independently or in groups to deal with created 'incidents'.

    This will be a coordinated event that I will support both in my role as instructor and as a member of LISABuild and Labs, and the second phase will be running throughout the Conference. I will have some form of visual score keeping in Labs where people can walk in and see what's going on with the event. At the end, I will provide prizes and/or accolades for the best teams.

    Who should attend:

    Participants should be beginning to advanced system administrators of any stripe with an interest in IT Security and a desire to learn how to attack and defend against potential threats in their environments. It is required that participants have experience with *nix command line, basic networking and an understanding of virtual environments.

    Take back to work:

    Knowledge of how to evaluate an environment, find vulnerabilities and mitigate them, improve security monitoring and the detect and defend attacks. Students will learn how to use a working security toolkit which can be directly applied to students' home environment.

    Topics include:
    • basic security concepts and architectural design
    • how to scope and scan an environment using readily available tools and general sysadmin knowledge.
    • how to identify, understand, remediate vulnerabilities, and verify the solution
    • how to monitor and react to incursions
    Additional Materials:

    Michael Schloh, Europalab Networks

    Michael Schloh von Bennewitz is a computer scientist specializing in network engineering, embedded design, and mobile platform development. Responsible for research, development, and maintenance of packages in several community software repositories, he actively contributes to the Opensource development community.

    Michael speaks four languages fluently and presents at technical events every year. He teaches workshops exclusively on Internet of Things and Embedded Computing technology, traveling with a mobile laboratory of over 300 sensors, actuators, and computer devices.

    Michael's IoT knowledge profits from years of work at telecoms and relationships with industry leaders. He is a Intel innovator, Samsung partner, and Mozilla committer with the mandate to promote IoT technology.

    Additional information is found at http://michael.schloh.com/

    T7
    Interfacing with a Low Power IoT
    Michael Schloh, Europalab Networks

    Michael Schloh von Bennewitz is a computer scientist specializing in network engineering, embedded design, and mobile platform development. Responsible for research, development, and maintenance of packages in several community software repositories, he actively contributes to the Opensource development community.

    Michael speaks four languages fluently and presents at technical events every year. He teaches workshops exclusively on Internet of Things and Embedded Computing technology, traveling with a mobile laboratory of over 300 sensors, actuators, and computer devices.

    Michael's IoT knowledge profits from years of work at telecoms and relationships with industry leaders. He is a Intel innovator, Samsung partner, and Mozilla committer with the mandate to promote IoT technology.

    Additional information is found at http://michael.schloh.com/

    9:00 am5:00 pm

    Back Bay Ballroom D (LISA Lab)

    In this hands-on-hardware workshop, we explore the boundaries of traditional systems and where they converge with networks of billions of embedded devices. Starting with the theory of the Internet of Things, related data transports, and common protocols, we create embedded systems using a set of loaned hardware. Focusing on 802.3, 802.11, and Bluetooth Smart transports, we implement our own IoT edge routers serving our own network of sensor and actuator embedded computers. We will implement a simple messaging application using MQTT or AMQP, and round out the training by integrating our piecemeal solutions into a full fledged IoT system.

    Who should attend:

    Intermediate hardware or network engineers benefit most from this workshop.

    Take back to work:

    Attendees take to their work a broad understanding of what will power the next generation of embedded devices and how they interface with traditional large Internet systems.

    Topics include:

    Device classes

    • Whirlwind tour of hardware
    • Vendor market trends
    • Small manufacturing

    Transports

    • Copper 802.3
    • Wireless 802.11
    • Bluetooth 1-3
    • Bluetooth Smart
    • Zigbee and ANT+
    • Z-Wave
    • 6LoWPAN
    • LoRa and SigFox

    Protocols

    • Legacy
    • MQTT
    • AMQP
    • CoAP
    • ZeroMQ
    Additional Materials:

    Half Day Morning

    Jeanne Schock

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

    T3
    Speed up Your Change Control
    Jeanne Schock

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

    9:00 am12:30 pm

    Fairfax Room

    Speedy Change Control is not an oxymoron. This tutorial will provide practical, actionable steps to streamline and speed up change control at your organization without increasing risks. In "The Visible Ops Handbook", authors Behr, Kim and Spafford identify a culture of change management as common to high-performing IT groups: “change management does not slow things down in these organizations.” This tutorial will help anyone wishing to implement phase one of the Visible Ops Handbook: “Stabilize The Patient” and “Modify First Response”. While I draw heavily on IT infrastructure Library (ITIL) guidance, much of this is common sense good practice based on lessons learned from past success and failure. No special ticketing system, tools or ITIL knowledge are necessary. I am a certified ITIL Expert with over five years of experience designing, improving and managing a successful change management process at an audited technology company delivering public registry and DNS services running on complex technologies across international data centers.

    Who should attend:

    Technical people and managers who participate in a change management process, or who would like to build one but are afraid that doing so will slow them down.

    Take back to work:
    • Templates for change request types and procedures
    • Templates for creating standard operating procedures
    • ITIL-aligned talking points for making your case for these process improvements
    • Better understanding of change management and process in general
    Topics include:
    • Different change types
    • Assessing risks and potential impact
    • Defining change authorities specific for each change type
    • Metrics for measuring change process performance against goals
    • Release and deployment management
    • Devops
    • Continuous delivery
    Additional Materials:

    Brendan Gregg, Netflix

    Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He was previously a performance lead and kernel engineer at Sun Microsystems, where he developed the ZFS L2ARC and led performance investigations. He has also created numerous performance analysis tools, which have been included in multiple operating systems. His recent work includes developing methodologies and visualizations for performance analysis.

    Sasha Goldshtein, Sela Group

    Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Azure MRS, a Pluralsight author, and an international consultant and trainer. Sasha is a book author, a prolific blogger and open source contributor, and author of numerous training courses including .NET Debugging, .NET Performance, Android Application Development, and Modern C++. His consulting work revolves mainly around distributed architecture, production debugging and performance diagnostics, and mobile app development.

    T5
    Linux Performance Analysis Using Perf and BPF
    Brendan Gregg, Netflix

    Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He was previously a performance lead and kernel engineer at Sun Microsystems, where he developed the ZFS L2ARC and led performance investigations. He has also created numerous performance analysis tools, which have been included in multiple operating systems. His recent work includes developing methodologies and visualizations for performance analysis.

     
    Sasha Goldshtein, Sela Group

    Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Azure MRS, a Pluralsight author, and an international consultant and trainer. Sasha is a book author, a prolific blogger and open source contributor, and author of numerous training courses including .NET Debugging, .NET Performance, Android Application Development, and Modern C++. His consulting work revolves mainly around distributed architecture, production debugging and performance diagnostics, and mobile app development.

    9:00 am12:30 pm

    Commonwealth Room

    This tutorial will give you experience with two powerful Linux performance analysis tools: perf and BPF. Learn how to profile CPU usage, create flame graphs, trace TCP connections, investigate file system latency, explore software internals, and more.

    perf_events, aka "perf" after its front-end, is a Linux mainline tool for profiling and tracing. We will summarize some of its most useful one-liners, and discuss real world challenges and solutions for using it with JIT runtimes (Java, Node.js), and in cloud environments.

    Enhanced BPF (Berkeley Packet Filter) is a new in-kernel programmable runtime with a variety of uses, including extending Linux static and dynamic tracing capabilities. We'll primarily focus on the BPF Compiler Collection (bcc) front-end for BPF, which provides a toolkit of many ready-to-run analysis tools, including DTrace classics like execsnoop, opensnoop, and biolatency, and new tools including memleak, trace, and argdist. bcc also provides Python and C interfaces for writing your own powerful dynamic tracing-based tools, and we'll show how that can be done.

    We will spend more time exploring the new world of BPF and its features that were made available in the Linux 4.4 release. Enhanced BPF has become a recent hotspot for systems innovation, helping create other new technologies including bcc, kernel connection multiplexer (KCM), and eXpress Data Path (XDP), and is being developed by engineers from many companies, including Facebook, PLUMGrid, Netflix, Cisco, Huawei, Github, SELA, and Intel. Join this workshop to get up to speed with BPF for tracing, try some hands-on labs, and gain real experience with the technology from contributor and performance expert Brendan Gregg.

    Topics include:
    • perf
    • Enhanced Berkeley Packet Filter (BPF)
    • BPF Compiler Collection
    • Python and C interfaces to BPF
    Additional Materials:

    Half Day Afternoon

    Chris McEniry, Sony Interactive Entertainment

    Chris "Mac" McEniry is a practicing sysadmin and architect responsible for running a large E-commerce and gaming service. He's been working and developing in an operational capacity for 15+ years. In his free time, he builds tools and thinks about efficiency.

    T4
    Go: The Programming Language
    Chris McEniry, Sony Interactive Entertainment

    Chris "Mac" McEniry is a practicing sysadmin and architect responsible for running a large E-commerce and gaming service. He's been working and developing in an operational capacity for 15+ years. In his free time, he builds tools and thinks about efficiency.

    1:30 pm5:00 pm

    Commonwealth Room

    Go is a relatively young language that was built with systems programming in mind. It's compact yet powerful grammar aids the swift development of efficient tools for everyday work. Despite its young age, it's already taken a prominent position for system tools. This hands-on tutorial focuses on reading and writing in the Go programming language.

    Who should attend:

    Anyone with a little bit of programming experience that wants to pick up Go

    Take back to work:

    The ability to read and write Go

    Topics include:
    • Control Structures
    • Types
    • Functions
    • Goroutines
    • Channels
    Additional Materials:

    Jeanne Schock

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

    T6
    Building a Process to Manage the Problems That Cause Incidents
    Jeanne Schock

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

    1:30 pm5:00 pm

    Fairfax Room

    All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in reoccurrences of the incident.

    The goals of the Problem Management Process are to prevent reoccurrence of incidents, prevent problems and resulting incidents from happening, and minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process. I am a certified ITIL Expert. I designed, implemented and then managed a problem process for four years at a registry and DNS service provider with complex technologies across international datacenters.

    Who should attend:

    Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.

    Take back to work:
    • A step-by-step guide for building and implementing a problem process and the reasons behind each step
    • A process template with examples that can be easily adapted to fit your organization’s current and future needs
    • Instructions on setting up a Known Error Database and communicating work arounds with impacted support teams
    • Guidance for getting buy-in from peers and managers
    Topics include:
    • Incident response vs problem resolution
    • Root cause analysis techniques
    • Making decisions that are aligned with business objectives
    • Getting buy-in from teammates, colleagues and managers
    • Proactive problem management
    • After-action reviews as a tool
    • “Root cause” vs. multiple causes
    Additional Materials:
    Monday, February 27, 2017

    Half Day Morning

    Michael Bender, Stony Brook University

    Michael A. Bender is a professor of computer science at Stony Brook University. He was Founder and Chief Scientist at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

    Bender's research interests span the areas of data structures and algorithms, I/O-efficient computing, scheduling, and parallel computing. He has coauthored over 125 articles on these and other topics. He has won several awards, including an R\&D 100 Award, a Test-of-Time award, two Best Paper Awards, and five awards for graduate and undergraduate teaching.

     

    Bender received his B.A. in Applied Mathematics from Harvard University in 1992 and obtained a D.E.A. in Computer Science from the Ecole Normale Superieure de Lyon, France in 1993. He completed a Ph.D. on Scheduling Algorithms from Harvard University in 1998. He has held Visiting Scientist positions at both MIT and King's College London.

    Martin Farach-Colton, Rutgers University

    Martin Farach-Colton is a professor of computer science at Rutgers University. He was Founder and CTO at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

    Farach-Colton works on pure and applied algorithms in I/O-efficient storage systems, streaming algorithms and string matching. He has coauthored over 150 articles. He has won several awards, including a Sloan Foundation Fellowship, a Test-of-Time award, a Best Paper award, and teaching awards. He was named a distinguished alum of the University of Maryland Computer Science Department on the occasion of their 40th anniversary.

    Farach-Colton received his B.S. in Mathematics and Chemistry from the University of South Carolina in 1984. He received his M.D. from Johns Hopkins in 1988 and his Ph.D. from the University of Maryland in 1991. He has been a Member of Technical Staff at Bell Labs (1997-98) and was an early employee of Google, Inc. (2000-2002).

    Rob Johnson, Stony Brook University

    Rob Johnson is a Research Assistant Professor at Stony Brook University and conducts research on Security, Big Data Algorithms, and Cryptography. He is director of the Security, Programming Languages, And Theory (SPLAT) lab at Stony Brook, the Cryptography Lab at the New York Center for Excellence in Wireless and Information Technology (CEWIT), and the Smart Grid Cyber-Security Testing Lab of the New York Advanced Energy Research and Technology Center (AERTC).

    He does theoretical work with an impact on the real world. He developed BetrFS, a file system that uses recent advances in data structures to improve performance on some operations by over an order of magnitude. He invented the quotient filter, a high-performance alternative to the Bloom filter for Big Data applications. He founded cache-adaptive analysis, a theoretical framework for designing and analyzing algorithms that dynamically share memory with other processes. He broke the High-bandwidth Digital Content Protection (HDCP) crypto-system used in almost all DVD players and TVs. He co-authored CQual, a static analysis tool that has found dozens of bugs in the Linux kernel and has been used to audit the entire Debian Linux distribution for format-string bugs.

    Rob completed his Ph.D. at UC Berkeley in 2006.

    M1
    Write Optimization in Storage Systems
    Michael Bender, Stony Brook University

    Michael A. Bender is a professor of computer science at Stony Brook University. He was Founder and Chief Scientist at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

    Bender's research interests span the areas of data structures and algorithms, I/O-efficient computing, scheduling, and parallel computing. He has coauthored over 125 articles on these and other topics. He has won several awards, including an R\&D 100 Award, a Test-of-Time award, two Best Paper Awards, and five awards for graduate and undergraduate teaching.

     

    Bender received his B.A. in Applied Mathematics from Harvard University in 1992 and obtained a D.E.A. in Computer Science from the Ecole Normale Superieure de Lyon, France in 1993. He completed a Ph.D. on Scheduling Algorithms from Harvard University in 1998. He has held Visiting Scientist positions at both MIT and King's College London.

     
    Martin Farach-Colton, Rutgers University

    Martin Farach-Colton is a professor of computer science at Rutgers University. He was Founder and CTO at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

    Farach-Colton works on pure and applied algorithms in I/O-efficient storage systems, streaming algorithms and string matching. He has coauthored over 150 articles. He has won several awards, including a Sloan Foundation Fellowship, a Test-of-Time award, a Best Paper award, and teaching awards. He was named a distinguished alum of the University of Maryland Computer Science Department on the occasion of their 40th anniversary.

    Farach-Colton received his B.S. in Mathematics and Chemistry from the University of South Carolina in 1984. He received his M.D. from Johns Hopkins in 1988 and his Ph.D. from the University of Maryland in 1991. He has been a Member of Technical Staff at Bell Labs (1997-98) and was an early employee of Google, Inc. (2000-2002).

     
    Rob Johnson, Stony Brook University

    Rob Johnson is a Research Assistant Professor at Stony Brook University and conducts research on Security, Big Data Algorithms, and Cryptography. He is director of the Security, Programming Languages, And Theory (SPLAT) lab at Stony Brook, the Cryptography Lab at the New York Center for Excellence in Wireless and Information Technology (CEWIT), and the Smart Grid Cyber-Security Testing Lab of the New York Advanced Energy Research and Technology Center (AERTC).

    He does theoretical work with an impact on the real world. He developed BetrFS, a file system that uses recent advances in data structures to improve performance on some operations by over an order of magnitude. He invented the quotient filter, a high-performance alternative to the Bloom filter for Big Data applications. He founded cache-adaptive analysis, a theoretical framework for designing and analyzing algorithms that dynamically share memory with other processes. He broke the High-bandwidth Digital Content Protection (HDCP) crypto-system used in almost all DVD players and TVs. He co-authored CQual, a static analysis tool that has found dozens of bugs in the Linux kernel and has been used to audit the entire Debian Linux distribution for format-string bugs.

    Rob completed his Ph.D. at UC Berkeley in 2006.

    9:00 am12:30 pm

    Write optimization refers to a set of techniques used to improve the performance of databases and file systems. Examples of write optimized data structures include Log-Structured Merge trees (LSMs) and Bε-trees. Systems that use such data structures include BetrFS, HBase, LevelDB, TableFS, TokuMX, TokuDB, etc.

    This tutorial reviews write-optimization from the perspective of the analysis and engineering. We provide a framework for understanding which data structure will perform well on which workloads.

    Additional Materials:

    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    M2
    Understanding Large-Scale Storage Systems
    Updated!
    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    9:00 am12:30 pm

    This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

    Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

    The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

    Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Ceph, Lustre, GPFS, PanFS, HDFS (Hadoop File System), and OpenStack.

    Additional Materials:

    Half Day Afternoon

    Gary A. Grider, Los Alamos National Laboratory

    Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

    Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

    John Bent, Seagate Government Solutions

    John Bent, Chief Architect for Seagate Government Solutions, has been working on storage systems for over 20 years. After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary Grider at LANL designing, maintaining, and measuring some of the world's largest parallel storage systems. Now at Seagate Gov, John helps design and map Seagate storage products, both hardware and systems, to emerging workloads in both Enterprise and Extreme IO.

    Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group. John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.

    M3
    Innovations, Challenges, and Lessons Learned in HPC Storage Yesterday, Today, and Tomorrow
    Gary A. Grider, Los Alamos National Laboratory

    Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

    Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

     
    John Bent, Seagate Government Solutions

    John Bent, Chief Architect for Seagate Government Solutions, has been working on storage systems for over 20 years. After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary Grider at LANL designing, maintaining, and measuring some of the world's largest parallel storage systems. Now at Seagate Gov, John helps design and map Seagate storage products, both hardware and systems, to emerging workloads in both Enterprise and Extreme IO.

    Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group. John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.

    1:30 pm5:00 pm

    In this tutorial, we will introduce the audience to the lunatic fringe of extreme high-performance computing and its storage systems. The most difficult challenge in HPC storage is caused by millions (soon to be billions) of simultaneously writing threads. Although cloud providers handle workloads of comparable, or larger, aggregate scale, the HPC challenge is unique because the concurrent writers are modifying shared data.

    We will begin with a brief history of HPC computing covering the previous few decades, bringing us into the petaflop era which started in 2009. Then we will discuss the unique computational science in HPC so that the audience can understand the unavoidability of its unique storage challenges. We will then move into a discussion of archival storage and the hardware and software technologies needed to store today’s exabytes of data forever. From archive we will move into the parallel file systems of today and will end the lecture portion of the tutorial with a discussion of anticipated HPC storage systems of tomorrow. Of particular focus will be namespaces handling concurrent modifications to billions of entries as this is what we believe will be the largest challenge in the exascale era.

    The tutorial will end with a free-ranging audience directed panel.

    Topics include:
    • A brief history lesson about the past 30 years of supercomputers
    • An understanding of what makes HPC computing unique and the entailing storage challenges
    • An overview of current HPC storage technologies such as burst buffers, parallel file systems, and archival storage
    • A glimpse into the future of HPC storage technologies for both hardware and software
    • Insights into unique research opportunities to advance HPC storage
    Additional Materials:

    Andy Rudoff, Data Center Group, Intel Corporation

    Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

    M4
    Persistent Memory Programming: Challenges and Solutions in Multiple Languages
    Andy Rudoff, Data Center Group, Intel Corporation

    Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

    1:30 pm5:00 pm

    Both Windows and Linux now contain support for Persistent Memory, an emerging non-volatile memory (NVM) technology. Persistent Memory is available today in the form of NVDIMMs and is expected to explode in capacity in the near future. Unlike other NVM technologies, such as SSDs, Persistent Memory provides a byte-addressable programming model, allowing direct memory access like DRAM, but retaining its contents across power loss. Technologies such as Intel’s 3D XPoint are expected to provide terabytes of NVM per CPU socket, with performance near DRAM speeds. The result offers applications a new tier for data placement in addition to the traditional memory and storage tiers: the persistent memory tier. While there are numerous ways for an OS to leverage Persistent Memory in a way that is transparent to the application, converting an application to be "persistent memory aware" will allow the highest performance benefit.

    This tutorial will start with the basic SNIA NVM Programming Model used by operating systems to expose Persistent Memory to applications. We will walk through code examples showing how applications get access to Persistent Memory and we will pay special attention to safe programming practices such as flushing to persistence, atomic operations, and writing power-fail safe code. We will look at CPU instructions designed for atomic operations, cache flushing, and fencing, and how they interact with Persistent Memory.

    Next, the tutorial will provide a brief survey of available libraries, compilers, and research in this area. We will then walk through some more complex examples of persistent memory programming in C, C++, and Java. Using the open source NVM Libraries from http://pmem.io we will show how to solve the common programming pain points and how the higher-level languages can help avoid common persistent memory programming mistakes.

    Topics include:
    • The SNIA NVM Programming Model
    • How the Intel Architecture Supports Persistent Memory
    • The Challenges of Persistent Memory Programming
    • The Current State of the Persistent Memory Ecosystem
    • Programming Using the NVM Libraries from http://pmem.io
    • C, C++, and Java Persistent Memory Programming Techniques
    Additional Materials:
    Sunday, October 29, 2017

    Half Day Morning

    Ryan Jarvinen, Red Hat

    Ryan Jarvinen is an Open Source Advocate at CoreOS, focusing on improving developer experience in the container community. He lives in Oakland, California and is passionate about open source, open standards, open government, and digital rights. You can reach him as ryanj on twitter, github, and IRC.

    S1
    Hands-On Intro to Kubernetes
    Ryan Jarvinen, Red Hat

    Ryan Jarvinen is an Open Source Advocate at CoreOS, focusing on improving developer experience in the container community. He lives in Oakland, California and is passionate about open source, open standards, open government, and digital rights. You can reach him as ryanj on twitter, github, and IRC.

    9:00 am12:30 pm
    Bayview Room

    Kick off your journey to becoming a DevOps master by learning Kubernetes from the ground up. Get started with an introduction to distributed systems and the architecture behind Kubernetes; then learn about Kubernetes APIs and API object primitives. By the end of this workshop you’ll be deploying, scaling, and automating container-based solutions using open source tools for distributed computing.

    Slides: http://bit.ly/lisa17-k8s. Bring a laptop with the following materials: http://bit.ly/lisa17-k8s#/workshop-setup.

    Who should attend:

    For developers, systems administrators, "DevOps" folks, architects, and those who are interested in learning about distributed systems via hands-on examples. Attendees should have some basic knowledge of Linux Containers (docker) and have an interest in using distributed architectures to develop web solutions.

    Take back to work:

    Attendees will learn how to deploy, scale, update, and manage container-based solutions through hands-on examples and exercises

    Topics include:

    Kubernetes, Distributed computing and solutions delivery, SRE, container operations

    Additional Materials:

    Mark Lamourine, Red Hat

    Mark Lamourine fell into system administration when the VAX shop he worked in as a student inherited a set of HP/UX boxes. He became the de-facto admin because he was the only one in the group who had read a man(8) page. Since then he's done stints as a developer, a QA engineer, a lab infrastructure manager and infrastructure admin at a now-defunct world-wide ISP. These days he plays the Sysadmin Advocate to software developers who think software is done when they've installed it once in Vagrant.

    When not computer geeking Mark geeks on road bicycles. He's been riding road fixed-gear for fun since before that was a thing.

    S4
    Underpinnings—Container Hosts: CoreOS and Project Atomic
    Mark Lamourine, Red Hat

    Mark Lamourine fell into system administration when the VAX shop he worked in as a student inherited a set of HP/UX boxes. He became the de-facto admin because he was the only one in the group who had read a man(8) page. Since then he's done stints as a developer, a QA engineer, a lab infrastructure manager and infrastructure admin at a now-defunct world-wide ISP. These days he plays the Sysadmin Advocate to software developers who think software is done when they've installed it once in Vagrant.

    When not computer geeking Mark geeks on road bicycles. He's been riding road fixed-gear for fun since before that was a thing.

    9:00 am12:30 pm
    Marina Room

    In this talk I outline the characteristics that define a "container host", an OS tuned to run software in containers. Explore the benefits and peculiarities of a stripped down, light weight minimal OS image and the implications for CM and update strategies.

    Then I explore the architecture of two common container hosts, CoreOS and Project Atomic. Each has characteristics that make it suitable for different environments. Users will install one of the two environments and follow along probing and observing how a container host differs in operation from a conventional package based host.

    Finally I will look at how a sysadmin's day to day tasks and operations will differ when running infrastructure services and providing application runtime environments for developers and users on container hosts. We will establish base network services (DNS, NTP, Authentication) on container hosts as well as installing and demonstrating utility containers to provide standard admin tools that are stripped from light-weight hosts.

    Who should attend:

    Sysadmins and service designers interested in learning to use container hosts to reduce host management.

    Participation requires access to local or cloud VM service.

    Take back to work:

    Attendees will understand the goals and basic design requirements for container hosts. They will get an overview of the design of both CoreOS and Atomic host, highlighting the differences in architecture and how these inform the choice of container host for an installation.

    They will learn how to boot and integrate container hosts into their existing infrastructure. They will know how to install and use traditional host tools from containers and how to manage, update and customize container hosts.

    They will create a sample cluster of either CoreOS or Atomic hosts in a demo environment.

    Topics include:
    • Container Hosts
    • Large Scale Container Infrastructure
    • Atomic Host and CoreOS architecture
    Additional Materials:

    George Beech, Stack Exchange

    George has been an SRE generalist at Stack Exchange for Since October, 2011. Before that he worked for a Multinational CRM company running their IVR infrastructure. He has worked on every part of the stack from Windows, to Linux, to the network infrastructure. He is currently serving his first term as a LOPSA Director. His experience working in the IT field over more than a decade has led him to love working with multiple technologies, and allowed him to experience everything from running a small network as a consultant to being part of a large team running very large scale infrastructure.

    In the past he has spoken at LISA, Velocity NYC, Local user groups, and LOPSA-EAST. As well as writing about his experience working on a high volume web infrastructure on his personal blog as well as the Server Fault blog.

    S5
    Automating Azure with Resource Manager Templates
    George Beech, Stack Exchange

    George has been an SRE generalist at Stack Exchange for Since October, 2011. Before that he worked for a Multinational CRM company running their IVR infrastructure. He has worked on every part of the stack from Windows, to Linux, to the network infrastructure. He is currently serving his first term as a LOPSA Director. His experience working in the IT field over more than a decade has led him to love working with multiple technologies, and allowed him to experience everything from running a small network as a consultant to being part of a large team running very large scale infrastructure.

    In the past he has spoken at LISA, Velocity NYC, Local user groups, and LOPSA-EAST. As well as writing about his experience working on a high volume web infrastructure on his personal blog as well as the Server Fault blog.

    9:00 am12:30 pm
    Garden Room

    This class will teach administrators how to get a project up and running with Azure Resource manager templates. These templates are an easy way to define, manage, and deploy instances into the Azure cloud using this technology. Additionally, I will go over some basic best practices for making your template more manageable.

    Who should attend:

    System Administrators who are new to Azure, or have not worked with Resource Manager Templates in the past. Anyone interested in streamlining and automating his or her workflow in the Azure cloud

    Take back to work:

    Attendees will take back to work the basic skills to get started automating their Azure deployment. They will have the basic understanding and baseline knowledge to work with ARM templates.

    Topics include:
    • Azure Resource Manager
    • Azure Powershell/Azure CLI
    • Basics of the ARM Template layout
      • Metadata
      • Parameters
      • Variables
      • Template file
    • Using Parameters, and variables to generalize your deployment
    • Adjusting resource sizing on the fly
    • Deploying Resources
      • Base resource
      • Sizing
      • Monitoring Configuration
    • Resource Dependencies
    • Troubleshooting Templates
    • Tips and tricks to help you configure templates
    Additional Materials:

    Full Day

    S2
    Everything an Administrator Wanted to Know about Operating System Internals but Was Afraid to Ask
    9:00 am5:00 pm
    Seacliff AB Room

    Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "It's sometimes slow."

    This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis. As an added bonus, it will prime you for the following day's "Linux Performance Tuning" tutorial with Theodore Ts'o.

    You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.

    Buckle up.

    Additional Materials:

    Tyler Fitch, Adobe

    Tyler is a Site Reliability Engineer for the Adobe Stock site—working to automate all the things done to build and release changes to the Stock platforms. He recently finished three years of "post graduate work" in DevOps as an Architect in Chef's Customer Success Program where he helped Chef's largest enterprise customers have delightful experiences in IT Automation. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

    S3
    Automation Tools Bootcamp
    Tyler Fitch, Adobe

    Tyler is a Site Reliability Engineer for the Adobe Stock site—working to automate all the things done to build and release changes to the Stock platforms. He recently finished three years of "post graduate work" in DevOps as an Architect in Chef's Customer Success Program where he helped Chef's largest enterprise customers have delightful experiences in IT Automation. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

    9:00 am5:00 pm
    Seacliff C Room

    The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new IT automation tools. We will learn about and then use Vagrant, Chef, Packer, Docker, Terraform, and Artifactory to deploy a small application in local VMs.

    We will cover a progression of tasks, leveraging information from previous sections to deploy a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.

    Who should attend:

    Operations, QA, those who choose to call themselves DevOps, and even managers can come learn.

    Take back to work:

    These automation tools are freely available to engineers, enabling them to safely break local environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.

    Topics include:

    Vagrant, Chef, Packer, Docker, Terraform, and Artifactory

    Additional Materials:

    Half Day Afternoon

    Michal Sekletar, Red Hat

    Michal Sekletar joined Red Hat in 2011 and currently works as Senior Software Engineer in the "Plumbers" team. He spends his days working and supporting init systems and other low level user-space components. He holds a Masters degree from Brno University of Technology. His other professional interests include programming languages, algorithms, and UNIX-like (other than Linux) operating systems.

    S6
    Service Management with systemd
    Michal Sekletar, Red Hat

    Michal Sekletar joined Red Hat in 2011 and currently works as Senior Software Engineer in the "Plumbers" team. He spends his days working and supporting init systems and other low level user-space components. He holds a Masters degree from Brno University of Technology. His other professional interests include programming languages, algorithms, and UNIX-like (other than Linux) operating systems.

    1:30 pm5:00 pm
    Bayview Room

    Tasks like management and maintenance of services that are critical to the business are on the daily TODO list of every system administrator. Also, containers and micro-service based architectures create the reality in which number of services that sysadmin has to manage is ever growing. To successfully manage thousands of services we need smart tools that can help us. In this session, we will look at systemd. Init system and service manager used by all major Linux distributions. Session will be a hands-on, interactive look at the architecture, capabilities, and administrative how-tos of systemd. Anyone who is new to systemd or looking to dig deeper into some of the advanced features should attend. Please bring a laptop with a virtual machine running a distribution of your choice that uses systemd.

    Who should attend:

    Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.

    Take back to work:

    Understanding of how systemd works, where to find the configuration files, and how to maintain them.

    Topics include:
    • The basic principles of systemd
    • systemd's major components
    • Anatomy of a systemd unit file
    • Understanding and optimizing the boot sequence
    • Improved system logging with the journal
    • Resource management via systemd's cgroups interface
    • Simple security management with systemd and the kernel's capabilities
    • systemd, containers, and virtualization
    Additional Materials:
    S7
    One Metrics Framework to Rule Them All
    1:30 pm5:00 pm
    Garden Room

    Have you ever wondered how to find the “one metric that matters” (for your team)? Or how to magically communicate why your team is doing what you’re doing so everyone can understand? Or moving back several steps -- how should you decide which work to focus on? This tutorial isn’t the magic pill, but it’s the closest thing to get you to be able to answer all of those questions. And once you learn it, you’ll be able to sketch it out on the back of a napkin.

    I’ve used this simple framework with:

    • Fortune 500 executives decide on the right metrics to use for their latest initiatives and communicate it throughout the organization
    • Sysadmins to communicate their latest improvement work across their own teams and to “the business”
    • My own research ranging from complex hardware studies to the State of DevOps Reports

    The framework works for all types of measures: system, survey, technical, financial, etc.

    Who should attend:

    Engineers, managers, anyone needing to plan or understand a system.

    Take back to work:

    When you leave this tutorial, you’ll be able to:

    • Communicate your measurement framework in a straightforward manner
    • Identify key measures for your own improvement work, and share this easily with the data team (whether that’s you or another team)
    • Chain your measurement frameworks, allowing you to link executive-level initiatives to middle management goals to practitioner workstreams
    Topics include:

    Metrics

    Additional Materials:

    Carol Smith, Microsoft

    Carol Smith has over 12 years experience with programs, communities, and partnerships. She worked at GitHub managing education partnerships for the Student Developer Pack and at Google managing the Google Summer of Code program. She has a degree in Journalism from California State University, Northridge, and is a cook, cyclist, and horseback rider.

    Heidi Waterhouse, Consultant

    Heidi Waterhouse is a freelance technical writer, information architect, and active conference speaker. Her experience as an in-demand consultant has given her insight into the interview process across several industry segments and allows her to generate meaningful answers to a wide variety of weird interview questions. In her spare time, she considers the technical writing aspects of sewing patterns.

    S8
    The Hardest Problem in Tech(nical Interviewing) Is People: The Personal Skills in Interviewing
    Carol Smith, Microsoft

    Carol Smith has over 12 years experience with programs, communities, and partnerships. She worked at GitHub managing education partnerships for the Student Developer Pack and at Google managing the Google Summer of Code program. She has a degree in Journalism from California State University, Northridge, and is a cook, cyclist, and horseback rider.

     
    Heidi Waterhouse, Consultant

    Heidi Waterhouse is a freelance technical writer, information architect, and active conference speaker. Her experience as an in-demand consultant has given her insight into the interview process across several industry segments and allows her to generate meaningful answers to a wide variety of weird interview questions. In her spare time, she considers the technical writing aspects of sewing patterns.

    1:30 pm5:00 pm
    Marina Room

    Technical interviews can be intimidating, but it’s easier if you have confidence in yourself and your ability to answer complicated questions. The hardest questions are not about sorting algorithms, but how you’ll work in a team, how you’ll resolve conflicts, and what it will be like to manage and work with you. This workshop exists to address the skills and theories of presenting yourself as confident, capable, and coachable.

    Who should attend:

    We envision the audience for this tutorial to be people interviewing for technical or technical-adjacent roles at technology companies who are early career (2-7 years). It is meant for beginners, but all are welcome if they want to brush up on their interviewing skills.

    Take back to work:

    The audience will experience hands-on practice, and can expect to learn tactics for preparing for and excelling at interviews. We will provide handouts for participants to use after the workshop and for practice. Participants will learn how to accomplish the checkpoints of a hiring workflow, including: phone screens, phone interviews, in-person interviews, and how to accept or reject an offer. The take-home worksheets will provide types of interview questions, job search rubric, self-evaluation forms, and resources for further research.

    Topics include:

    Culture, Interviewing, Career, Early Career, Technology Industry

    Additional Materials:
    Monday, October 30, 2017

    Full Day

    Branson Matheson, Cisco Systems, Inc.

    Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

    M1
    Defending against the Dark Arts
    Branson Matheson, Cisco Systems, Inc.

    Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

    9:00 am5:00 pm
    Bayview Room

    Today's threats to the enterprise are manifested in many ways but all share similar traits: highly intelligent, well-funded and determined to gain access. In this class, we will explore the murky world of the black-hats. We will examine your security foot-print as they view it, and discuss ways to minimize it, various vectors for attack, and how to detect and defend. We will spend time talking about current threats, and how they can impact your company, and we will build upon the foundations of good security practice. This class has been updated with current events and topics relative to environment profiling, social engineering and new attack vectors. As with all my classes, this will be accompanied with a pinch of humor and a large dollop of common sense.

    Who should attend:

    Participants should be beginning to mid-level system administrators of any stripe with an interest in IT Security and a desire to understand their potential adversaries. It is suggested that participants have experience with *nix command line and virtual hosts.

    Take back to work:

    Tools, tips, tricks, and a working security toolkit which can be implemented to improve monitoring, detection, and defense in your organization. Experience working with (mostly) free security software tools.

    Topics include:

    Security, Risk Evaluation, Social Engineering

    Additional Materials:

    Half Day Morning

    M2
    The Accidental DBA
    9:00 am12:30 pm
    Garden Room

    Open source relational databases like MySQL and PostgreSQL power some of the world's largest websites, including Yelp. They can be used out of the box with few adjustments and rarely require a dedicated database administrator for the first few months or even years. This means that System Administrators and Site Reliability Engineers are usually the first to respond to some of the more "interesting" issues that can arise as you scale your databases. This tutorial will cover MySQL, but many of the concepts apply to PostgreSQL and other open source RDBMS's. We'll first go over a broad set of DBA basics to introduce MySQL Database Administration and next cover the InnoDB storage engine, database defense and monitoring. Finally, I'll cover the wide array of online resources, books, open source toolkits, and scripts from MySQL, Percona, and the Open Source community that will make the job easier.

    Who should attend:

    Sysadmins and SREs of all levels who have an interest or need to learn MySQL or supporting an open source relational database.

    Take back to work:

    Sysadmins and SREs who join us for this tutorial will come away with a real-world and ready for production understanding of why and how MySQL works the way it does.

    Topics include:
    • MySQL Installation and Configuration
    • Architecture and Filesystem Layout
    • InnoDB Tuning and Optimization
    • Transactions
    • Replication and Scaling Out
    • Schema/Query Basics, Indexes, and Query Plans
    • Deciphering Common Errors
    • Monitoring
    • Backup and Restore
    • Troubleshooting
    • Online Communities
    • Open Source Toolkits
    Additional Materials:

    Robert Ballance, Independent Computer Scientist

    Dr. Robert Ballance recently completed a White House Presidential Innovation Fellowship where he applied his skills with R to analyzing and delivering broadband deployment data to communities across the U.S.A. He first developed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the American Association for the Advancement of Science. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989.

    M3
    Introduction to R for System Administrators
    Robert Ballance, Independent Computer Scientist

    Dr. Robert Ballance recently completed a White House Presidential Innovation Fellowship where he applied his skills with R to analyzing and delivering broadband deployment data to communities across the U.S.A. He first developed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the American Association for the Advancement of Science. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989.

    9:00 am12:30 pm
    Seacliff C Room

    The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for automating the process with reproducible and repeatable results. This brief introduction to R and its ecosystem will provide a walk along the mainline — coming up to speed on R, accessing data, and getting results.

    This tutorial will

    • motivate you to pick up R
    • introduce the basics of the R language
    • demonstrate useful techniques using R and RStudio
    • illustrate ways to simplify your life by automating data analysis and reporting

    In-class demonstrations will be complemented with hands-on opportunities during the workshop. Additional exercises and data sets that students can explore following the workshop will be provided.

    Who should attend:

    This tutorial is designed for system administrators who are awash in operational data and who want to do a more efficient job of understanding their data and communicating their findings to others. Some facility with programming and a knowledge of basic descriptive statistics are assumed. Prior knowledge of R is not required.

    Take back to work:
    • Understanding where R fits into the system administrator’s tool set
    • Acquaintance with R, R packages, and R Studio
    • Familiarity with basic R data-manipulation techniques
    • Motivation to learn or improve your R skills
    • Next steps to take in mastering R
    Topics include:

    Analytics of System Data

    Additional Materials:

    Christopher DeMarco, Rhiza, a Nielsen Company

    Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

    M4
    Deployment and Orchestration with Terraform
    Christopher DeMarco, Rhiza, a Nielsen Company

    Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

    9:00 am12:30 pm
    Seacliff AB Room

    Terraform is a tool for deploying and configuring cloud infrastructure in AWS, Google Compute Engine, Digital Ocean, Azure, and many, many other platforms. It is a consistent, robust, well-maintained alternative to clicking in a web interface or writing custom provisioning code against the cloud provider's API.

    This tutorial will show code and runtime examples of deploying various types of cloud infrastructure in AWS, Google Compute Engine, and others. Interactivity is unfortunately not offered due to the logistics of billing for arbitrary cloud resources.

    Who should attend:

    Novice- to intermediate-level sysadmins who to learn what Terraform is and what it's good for, why you'd use it instead of your cloud provider's web interface or API, and how to implement common patterns across several different providers.

    Take back to work:

    What is Terraform? What is it good for? How do we use it to build/manage infrastructure? How do we scale it to a team?

    Topics include:

    Terraform

    Additional Materials:

    Tom Limoncelli, Stack Overflow, Inc.

    Tom is the SRE Manager at StackOverflow.com and author of Time Management for System Administrators (O'Reilly). He is co-author of The Practice of System and Network Administration (3rd edition just released) and The Practice of Cloud System Administration. He is an internationally recognized author, speaker, system administrator, and DevOps advocate. He's previously worked at small and large companies including Google, Bell Labs/Lucent, and AT&T. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

    M5
    Personal Time Management: The Basics for Sysadmins That Are Overloaded
    Tom Limoncelli, Stack Overflow, Inc.

    Tom is the SRE Manager at StackOverflow.com and author of Time Management for System Administrators (O'Reilly). He is co-author of The Practice of System and Network Administration (3rd edition just released) and The Practice of Cloud System Administration. He is an internationally recognized author, speaker, system administrator, and DevOps advocate. He's previously worked at small and large companies including Google, Bell Labs/Lucent, and AT&T. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

    9:00 am12:30 pm
    Marina Room

    Whether you are a sysadmin, dev, or web ops, time management can be more difficult than any technology issue. This class is for new and junior system admins that have found themselves over their head, overloaded, and looking for a better way to survive the tech world.

    This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.

    Who should attend:

    Sysadmins, devs, operations, and their managers

    Take back to work:

    By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).

    Topics include:
    • How to manage all the work you have to do.
    • How to prioritize and eliminate unnecessary tasks.
    • Manage interruptions: prevent them, managing the ones you get.
    • The Cycle System for recording and processing to-do lists
    • Task grouping: batching, sharding, and multitasking
    Additional Materials:

    Half Day Afternoon

    Jeanne Schock, Armstrong Flooring Inc.

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.

    M6
    Building a Process to Manage the Problems that Cause Incidents
    Jeanne Schock, Armstrong Flooring Inc.

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.

    1:30 pm5:00 pm
    Marina Room

    All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in recurrences of the incident.

    The goals of the Problem Management Process are to prevent repeat incidents and to minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them to be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process.

    Who should attend:

    Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.

    Take back to work:
    • a step-by-step guide for building and implementing a problem process and the reasons behind each step
    • a process template with examples that can be easily adapted to fit your organization’s current and future needs
    • instructions on setting up a Known Error Database and communicating work arounds with impacted support teams
    • guidance for getting buy-in from peers and managers
    • a complete kit for starting to use After Action Reviews to handle the human component of problems
    Topics include:
    • Incident response vs. problem resolution
    • Root cause analysis techniques
    • Making decisions that are aligned with business objectives
    • Getting buy-in from teammates, colleagues and managers
    • Proactive problem management
    • After-action reviews
    Additional Materials:

    Jérôme Petazzoni, Docker Inc.

    Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.
    M7
    Getting Started with Docker and Containers
    Jérôme Petazzoni, Docker Inc. Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.
    1:30 pm5:00 pm
    Garden Room

    If you still haven't checked that Docker thing, but need (or want) to get started with containers, this tutorial is for you!

    After a short introduction explaining various usage scenarios for containers, we will roll up the sleeves of our T-shirts, and run a few simple containers through the Docker CLI. We will explain the difference between containers and images, and write a Dockerfile to build an image for a trivial application. Finally, we will present Compose, a tool to build, run, and manage stacks with multiple containers.

    No prior knowledge of Docker is needed. If you know how to interact with the UNIX command line, you're set! Some demos will feature code snippets in Python, Ruby, or even C; but you will be perfectly fine even if your language of choice is Bash.

    Advanced topics like networks, volumes, plugins, multi-stage builds, health checks, etc. will be mentioned but not covered in depth.

    The tutorial will be hands-on. You will be provided with a pre-configured Docker environment running on a cloud VM (you won't need to setup Docker or Vagrant or VirtualBox on your machine).

    Who should attend:

    Devs and ops who have managed to avoid the container hype so far but now want to catch up on all that Docker jazz

    Take back to work:

    The audience will learn about the basic principles of containers: what they are, what they're for, why they have been trending the last few years.

    They will also learn how to use the Docker CLI to run simple containers; build container images with Dockerfiles; start multi-container applications with Docker Compose.

    This will allow them to understand containers in general and Docker in particular; use them in simple scenarios; and have a reference point for more complex ones.

    Topics include:

    Docker, containers

    Additional Materials:

    Christopher DeMarco, Rhiza, a Nielsen Company

    Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

    M8
    Automation with Ansible
    Christopher DeMarco, Rhiza, a Nielsen Company

    Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

    1:30 pm5:00 pm
    Seacliff AB Room

    Ansible is a fantastic starting point for automation—either when the learning curve or the infrastructure around Chef/Puppet is too high. New users can start writing useful automation playbooks with just an SSH connection and an hour (or two) reading the docs.

    This tutorial will alternate between lecture and hands-on activities using (instructor-supplied) disposable cloud infrastructure.

    Who should attend:

    Sysadmins with zero exposure to Ansible through intermediate-level users who want a guided tour of its potential.

    Take back to work:

    Knowledge of what Ansible is, how it works, and how it compares with other configuration-management tools; hands-on experience using Ansible to solve real-world problems; and opinionated best-practices for saving blood, sweat, and/or tears.

    Topics include:

    Ansible

    Additional Materials:

    Courtney Eckhardt

    Courtney Eckhardt first got into retrospectives when she signed up for comp.risks as an undergrad (and since then, not as much has changed as we’d like to think). Her perspectives on engineering process improvement are strongly informed by the work of Kathy Sierra and Don Norman (among others).

    M9
    "I Never Want to Live through This Again!": Running Excellent Retrospectives
    Courtney Eckhardt

    Courtney Eckhardt first got into retrospectives when she signed up for comp.risks as an undergrad (and since then, not as much has changed as we’d like to think). Her perspectives on engineering process improvement are strongly informed by the work of Kathy Sierra and Don Norman (among others).

    1:30 pm5:00 pm
    Seacliff C Room

    Your site’s back up, you’re back in business. Do you have a way to make sure that problem doesn’t happen again? And if you do, do you like how it works?

    Heroku uses a blameless retrospective process to understand and learn from our operational incidents. We’ve recently released the templates and documentation we use in this process, but experience has taught us that facilitating a retrospective is a skill that’s best taught person to person.

    This tutorial will take you through a retrospective based on the internal and external communications of a real Heroku operational incident. We’ve designed it to help you experience first-hand the relaxed, collaborative space that we achieve in our best retrospectives. We’ll practice tactics like active listening, redirecting blame, and reframing conversations. Along the way, we’ll discuss how we developed this process, what issues we were trying to solve, and how we’re still iterating on it.

    Who should attend:

    Managers, tech leads, anyone interested in retrospective culture and iterating on processes.

    Take back to work:

    Attendees will have the materials and firsthand experience to advocate for (or to begin) an incident retrospective process at their workplace, or to improve a process they might already be using.

    Topics include:
    • Why run a retrospective
    • Goal of a retrospective
    • Blameless retrospectives
    • Facilitating: redirecting blame, reframing, drawing people out
    • How to structure a retrospective
    • Preparing for a retrospective
    • Five "why"s / infinite "how"s
    • Human error
    • Achieving follow-through on remediation items
    Additional Materials:
    Tuesday, October 31, 2017

    Full Day

    Theodore Ts'o, Google

    Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

    T5
    Linux Performance Tuning
    Theodore Ts'o, Google

    Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

    9:00 am5:00 pm

    Bayview Room

    Who should attend:

    Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

    Take back to work:

    The ability to hone your Linux systems for the specific tasks they need to perform.

    Topics include:
    • Strategies for performance tuning
    • Characterizing your workload's requirements
    • Finding bottlenecks
    • Tools for measuring system performance
    • Memory usage tuning
    • Filesystem and storage tuning
    • Network tuning
    • Latency vs. throughput
    • Capacity planning
    • Profiling
    • Memory cache and TLB tuning
    • Application tuning strategies
    Additional Materials:

    Half Day Morning

    Chris McEniry, Sony Interactive Entertainment

    Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

    T1
    Close to the Edge Systems Administration in Go
    Chris McEniry, Sony Interactive Entertainment

    Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

    9:00 am12:30 pm
    Seacliff C Room

    There's many times that the daily grind pushes you out of your comfort zone. Sometimes, you're in a bind and the best way forward is fashioning a tool out of what's available. Sometimes, those really are nails you see around you. This class looks at some of the normal, and some of the not so normal, uses for Golang in Systems Administration.

    Who should attend:
    • New Golang programmers who want to get a better idea of using the language (should have some familiarity with Golang).
    • Old dogs looking for new tricks.
    Take back to work:
    • Several MacGyver tools that may come in handy.
    • Techniques and approaches for some out of the box thinking.
    Topics include:
    • Running a quick and dirty TLS secured web server for file transfers
    • Collecting and serving up system metrics
    • Driving web applications from the command line
    • Speak http2
    • Fanout shell results from one system to many with ssh
    • Roll your own container system
    • and more
    Additional Materials:

    Sasha Goldshtein, CTO, Sela Group

    Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP, Pluralsight author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing—across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.

    T2
    Linux Performance Monitoring with BPF
    Sasha Goldshtein, CTO, Sela Group

    Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP, Pluralsight author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing—across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.

    9:00 am12:30 pm
    Garden Room

    eBPF (extended Berkeley Packet Filters) is a modern kernel technology that can be used to introduce dynamic tracing into a system that wasn't prepared or instrumented in any way. The tracing programs run in the kernel, are guaranteed to never crash or hang your system, and can probe every module and function—from the kernel to user-space frameworks such as Node and Ruby.

    In this workshop, you will experiment with Linux dynamic tracing first-hand. First, you will explore BCC, the BPF Compiler Collection, which is a set of tools and libraries for dynamic tracing. Many of your tracing needs will be answered by BCC, and you will experiment with memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the "baked" tools for file I/O, network, and CPU analysis. You'll be able to choose between working on a set of hands-on labs prepared by the instructors, or trying the tools out on your own test system.

    Next, you will hack on some of the bleeding edge tools in the BCC toolkit, and build a couple of simple tools of your own. You'll be able to pick from a curated list of GitHub issues for the BCC project, a set of hands-on labs with known "school solutions", and an open-ended list of problems that need tools for effective analysis. At the end of this workshop, you will be equipped with a toolbox for diagnosing issues in the field, as well as a framework for building your own tools when the generic ones do not suffice.

    Who should attend:

    Developers, SRE, ops engineers

    Take back to work:

    Low-overhead, production-ready tools based on the BPF kernel technology for CPU sampling, memory leak analysis, I/O and file issues, and many other performance and troubleshooting scenarios.

    Topics include:

    Performance, Monitoring, Tracing, BPF, Kernel

    Additional Materials:

    Mike Weilgart, Vertical Sysadmin, Inc.

    Mike Weilgart has loved maths and computers all his life. Graduating high school at the age of 13, he thereafter worked in a variety of positions including software QA, calculus teacher, and graphic design, before resolving to put his love of computers to professional use as a Linux sysadmin and trainer. Mike currently consults at a Fortune 50 company as an automation specialist, and enjoys nothing more than training people to full mastery of their tools.

    T3
    Git Foundations: Unlocking the Mysteries
    Mike Weilgart, Vertical Sysadmin, Inc.

    Mike Weilgart has loved maths and computers all his life. Graduating high school at the age of 13, he thereafter worked in a variety of positions including software QA, calculus teacher, and graphic design, before resolving to put his love of computers to professional use as a Linux sysadmin and trainer. Mike currently consults at a Fortune 50 company as an automation specialist, and enjoys nothing more than training people to full mastery of their tools.

    9:00 am12:30 pm
    Seacliff AB Room

    Dozens of commands! Hundreds of options! Git has dumbfounded sysadmins and developers alike since its appearance in 2005.

    And yet, this ingenious software is among the most fantastically useful ever developed.

    Learn Git from the ground up and the inside out with Git Foundations Training!

    This half-day class explores Git's internals in depth and includes unique practical exercises to gain familiarity and comfort in handling the nuts and bolts.

    Bring with you:

    • A laptop with a UNIX-like command-line environment on which "git --version" displays a version (any version).
    • A willingness to learn.
    Who should attend:

    No prior knowledge of Git is required. Basic Unix/Linux command line experience is assumed. Experienced users of Git have given rave reviews; the class is not aimed only at beginners, but at anyone wishing to thoroughly understand and use Git to the fullest.

    Take back to work:
    • A thorough and practical understanding of the internals of Git
    • The ability to easily and *confidently* manipulate Git repositories and their contents
    • Readiness to pick up and *quickly* learn more exotic and advanced Git commands (and to read the man pages easily!)
    Topics include:

    Git Internals are covered in depth, beginning from basic definitions and proceeding through the essentials of Graph Theory needed to appreciate Git's architecture. Plenty of audience Q&A throughout, live demonstrations, and diagrams. Following this complete theory portion comes the practical portion of the course, with hands-on exercises to ensure retention and application of all theory.

    Additional Materials:

    Jeanne Schock, Armstrong Flooring Inc.

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.

    T4
    Speed Up Your Change Control: Streamline Your Change Process without Increasing Risks
    Jeanne Schock, Armstrong Flooring Inc.

    Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.

    9:00 am12:30 pm
    Marina Room

    Speedy Change Control is not an oxymoron. This tutorial will provide practical, actionable steps to streamline and speed up change control at your organization without increasing risks. In The Visible Ops Handbook, authors Behr, Kim, and Spafford identify a culture of change management as common to high-performing IT groups: “change management does not slow things down in these organizations.” This tutorial will help anyone wishing to implement phase one of the handbook: “Stabilize The Patient” And “Modify First Response”. While I draw heavily on IT infrastructure Library (ITIL) guidance, much of this is common sense good practice based on lessons learned from past success and failure. No special ticketing system, tools or ITIL knowledge are necessary. I am a certified ITIL Expert. I have over five years of experience designing, improving and managing a successful change management process at an audited technology company delivering public registry and DNS services running on complex technologies across international data centers.

    Who should attend:

    Individuals and managers involved in preparing for and deploying changes and software builds in production environments.

    Take back to work:
    • templates for change request types and procedures
    • templates for creating standard operating procedures
    • ITIL-aligned talking points for making your case for these process improvements
    Topics include:
    • Change management
    • Process
    • Different change types to help you speed up the process
    • Assessing risks and potential impact
    • Defining change authorities specific for each change type
    • Metrics for measuring change process performance against goals
    • Release and deployment management
    • Continuous delivery
    Additional Materials:

    Half Day Afternoon

    Camille Baldock, Salesforce

    Camille Baldock is an infrastructure engineer with the Heroku Department of Data. She works on distributed systems monitoring, operations, automation, and tuning for Heroku Postgres.

    T6
    Troubleshooting Performance Issues in PostgreSQL
    Camille Baldock, Salesforce

    Camille Baldock is an infrastructure engineer with the Heroku Department of Data. She works on distributed systems monitoring, operations, automation, and tuning for Heroku Postgres.

    1:30 pm5:00 pm
    Marina Room

    This tutorial will give you ways of diagnosing and preempting PostgreSQL performance issues using a wide range of tools and techniques to measure and improve your database's performance. We will cover query optimisation, configuration, and OS settings for your database server and pooling, caching, replication, and partitioning strategies that can be used to ensure performance at scale.

    Who should attend:

    The target audience for this talk is server administrators and developers working with PostgreSQL, or considering using it. No specific knowledge of PostgreSQL is required but some background in RDBMS or SQL is recommended.

    Take back to work:

    System administrators will benefit by learning about:

    • what aspects of server and PostgreSQL configuration affect database performance and how to choose and tweak them
    • how to monitor the database server to maintain high performance

    Developers will benefit by learning about:

    • detecting performance issues in their database usage
    • optimising their queries
    Topics include:
    • This tutorial breaks down into the various potential causes of performance issues in PostgreSQL: how to diagnose them, fix them and monitor them
    • Query performance issues
    • Choosing the right PostgreSQL configuration within hardware and OS limitations
    • Operating system and hardware tweaks that can affect performance
    • Optimising database usage
    • Monitoring your database and database servers performance
    Additional Materials:

    John Looney, Intercom

    John Looney is an SRE in Intercom, pretending to be a Product Engineer, improving infrastructure and reliability while pretending to also add features customers want.

    Previously, he spent a decade in Google SRE running GFS, Borg, Colossus, Chubby, Datacenter Automation, Ads Quality pipelines and Ads Serving systems.

    He has been on the programme committee of SRECon Dublin for the last three years, and presented a 'Large Scale Design' tutorial at LISA in 2012.

    T7
    Distributed Systems Building Blocks
    John Looney, Intercom

    John Looney is an SRE in Intercom, pretending to be a Product Engineer, improving infrastructure and reliability while pretending to also add features customers want.

    Previously, he spent a decade in Google SRE running GFS, Borg, Colossus, Chubby, Datacenter Automation, Ads Quality pipelines and Ads Serving systems.

    He has been on the programme committee of SRECon Dublin for the last three years, and presented a 'Large Scale Design' tutorial at LISA in 2012.

    1:30 pm5:00 pm
    Seacliff AB Room

    All distributed systems make tradeoffs and compromises. Different designs behave very differently with respect to cost, performance, and how they behave under failure conditions.

    It's important to understand the tradeoffs that the building blocks in your systems make, and the implications this has for your system as a whole. In this workshop we'll look at several examples of different real-world distributed systems and discuss their strengths and shortcomings.

    This workshop will include some practical elements. Attendees will be given some system designs to read and to evaluate, and then we'll discuss the implications of each design together as a group.

    Who should attend:

    People working with distributed systems, who want to fill-in the blanks as to what 'distributed systems' are supposed to be.

    Take back to work:

    They will know the basic building blocks of distributed systems, how to choose between different implementations as needed.

    They will know the names and basic details on common distributed systems patterns, why they exist and what happens when they are not applied correctly.

    Topics include:

    Distributed Systems Primer

    Additional Materials:

    Aleksey Tsalolikhin, Vertical Sysadmin, Inc.

    Aleksey Tsalolikhin is a practitioner in the area of Operations of information systems. Aleksey's mission is to improve the lives of fellow practitioners through effective training in excellent technologies. Aleksey is the principal at Vertical Sysadmin, which provides on-site training on UNIX shell basics, version control with Git, Configuration Management, Continuous Integration/Continuous Deployment, SQL basics and more.

    T8
    Setting up CI/CD Pipelines
    Aleksey Tsalolikhin, Vertical Sysadmin, Inc.

    Aleksey Tsalolikhin is a practitioner in the area of Operations of information systems. Aleksey's mission is to improve the lives of fellow practitioners through effective training in excellent technologies. Aleksey is the principal at Vertical Sysadmin, which provides on-site training on UNIX shell basics, version control with Git, Configuration Management, Continuous Integration/Continuous Deployment, SQL basics and more.

    1:30 pm5:00 pm
    Garden Room

    Attendees will learn how CI/CD pipelines can increase IT velocity (from Dev to Ops), increase code quality and lower risk; and will learn how to implement CI/CD pipelines in two popular tools, Jenkins and GitLab CI.

    Who should attend:

    Infrastructure engineers, system administrators, or DevOps engineers familiar with Git who have to set up or support CI/CD pipelines.

    Take back to work:

    Familiarity with CI/CD concepts; ability to implement CI/CD pipelines using popular tools such as Jenkins and GitLab CI.

    Topics include:
    • Introduction and orientation
      • Origin of Continuous Integration (CI) at ThoughtWorks
      • Widespread adoption; how CI relates to DevOps
      • Basic tasks: Build, Test, Deploy
    • Jenkins
      • Overview and Architecture
      • Definition of Key Terms
      • Building, Testing and Deploying (with hands-on lab)
      • Checking Pipeline status with Jenkins Blue Ocean UI
      • Troubleshooting
    • GitLab CI
      • Architecture: GitLab, GitLab CI Multi Runner, ephemeral test environments
      • Definitions: pipeline, stage, job, build, runner, environment, artifact, cache
      • Setting up runners: adding job runners; host instance types (shell, Docker, ssh, etc.); runner/job tags
      • Building, Testing, and Deploying (with hands-on lab)
      • Troubleshooting: build logs; enabling verbose builds; increasing "loglevel"; interactive access to containers
    Additional Materials:

    Jérôme Petazzoni, Docker Inc.

    Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.
    T9
    Build, Ship, and Run Microservices on a Docker Swarm Cluster
    Jérôme Petazzoni, Docker Inc. Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.
    1:30 pm5:00 pm
    Seacliff C Room

    In this tutorial, you will setup your own Docker cluster, using the native orchestration features provided by the SwarmKit library. (SwarmKit is integrated with the Docker Engine since Docker 1.12.)

    Then, you will use that cluster to deploy and scale a sample application architectured around microservices.

    We will cover deployment tips, service discovery, load balancing; we will show how to integrate Swarm and Compose to obtain a seamless, automated "dev-to-prod" workflow; and we will show how to collect logs and metrics on a containerized platform.

    To get the most out of this tutorial, you should already be familiar with Docker! If you plan to attend this just after the other tutorial "Getting started with Docker and containers," you will definitely have to mind the gap.

    The tutorial will be hands-on; each attendee will be provided with a cluster of Docker nodes running on cloud VMs. The only software required on your machine is a SSH client (and a web browser).

    Who should attend:

    Folks who were excited by (or forced to deploy) Docker Swarm, but want to go beyond the trivial prototype, and implement a seamless dev-to-prod workflow, and tackle logging, metrics, security, etc.

    Take back to work:

    After this tutorial, the audience will know how to map their existing "ops knowledge" of traditional platforms, to container platforms.

    Topics include:

    Docker, cluster, Swarm, orchestration, containers

    Additional Materials:
    Monday, February 12, 2018

    Half Day Morning

    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    M1
    Understanding Large-Scale Storage Systems
    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    9:00 am12:30 pm

    Grand Ballroom ABCD

    This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

    Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

    The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

    Topics include:
    • SSD technology
    • Scaling the data path
    • Scaling metadata
    • Fault tolerance
    • Manageability
    • Cloud storage

    Specific systems are discussed, including Ceph, Lustre, GPFS, PanFS, HDFS (Hadoop File System), BigTable, LevelDB, and Google's Colossus File System

    Additional Materials:

    Ymir Vigfusson, Emory University

    Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

    His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

    Irfan Ahmad, CachePhysics

    Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

    Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

    M2
    Caches for the Persistent Memory and Flash Era
    Ymir Vigfusson, Emory University

    Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

    His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

     
    Irfan Ahmad, CachePhysics

    Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

    Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

    9:00 am12:30 pm

    Grand Ballroom EFGH

    For a very long time, practical scaling of every level in the computing hierarchy has required innovation and improvement in caches. This is as true for CPUs as it is for storage and networked, distributed systems. As such, research into cache efficiency and efficacy improvements has been highly motivated and continues with strong improvements to this day. However, there are certain areas in cache algorithms optimization that have only recently experienced breakthroughs.

    In this tutorial, we will start by reviewing the history of the caching algorithm research and practice in industry. Of particular interest to us are multi-tier memory hierarchies that are getting more complex and deep due to hardware innovations. These hierarchies and the workloads they generate motivate revisiting multi-tier algorithms. We will then describe cache utility curves and review recent literature that has made them easier to compute. Using this tool, we will excavate around caching policies and their trade-offs in different contexts. We will also spend some time thinking about optimality for caches.

    Topics include:
    • Overview and history of the caching algorithm research and practice in industry
    • Introduction to new challenges posed by multi-tier memory hierarchies
    • Review of cache utility curves and recent literature
    • Experimenting with caching policies for production uses cases
    • How to find the optimal cache
    Additional Materials:

    Half Day Afternoon

    Tom Talpey, Microsoft

    Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

    Andy Rudoff, Intel

    Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

    M3
    Advanced Persistent Memory Programming: Local, Remote and Cross-Platform
    Tom Talpey, Microsoft

    Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

     
    Andy Rudoff, Intel

    Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

    1:30 pm5:00 pm

    Grand Ballroom ABCD

    Persistent Memory (“PM”) support is becoming ubiquitous in today’s operating systems and computing platforms. From Windows to Linux to open source, and from NVDIMM, PCI Express, storage-attached and network-attached interconnect access, it is available broadly across the industry. Its byte-addressability and ultra-low latency, combined with its durability, promise a revolution in storage and applications as they evolve to take advantage of these new platform capabilities.

    Our tutorial explores the concepts and today’s programming methodologies for PM, including the SNIA NonVolatile Memory Programming Model architecture, open source and native APIs, operating system support for PM such as direct access filesystems, and via language and compiler approaches as well. The software PM landscape is already rich, and growing.

    Additionally, the tutorial will explore the considerations when PM access is extended across fabrics such as networks, I/O interconnects, and other non-local access. While the programming paradigms remain common, the implications on latency, protocols, and especially error recovery are critically important to both performance and correctness. Understanding these requirements are of interest to both the system and application developer or designer.

    Specific programming examples, fully functional on today’s systems, will be shown and analyzed. Concepts for moving new applications and storage paradigms to PM will be motivated and explored. Application developers, system software developers, and network system designers will all benefit. Anyone interested in an in-depth introduction to PM in emerging software and hardware systems can also expect an illuminating and thought-provoking experience.

    Topics include:
    • Persistent memory
    • Persistent memory technologies
    • Remote persistent memory
    • Programming interfaces
    • Operating systems
    • Open source libraries
    • RDMA
    Additional Materials:

    Sandeep Uttamchandani, Intuit

    Sandeep Uttamchandani is a Distinguished Engineer at Intuit, focussing on platforms for storage, databases, analytics, and machine learning. Prior to Intuit, Sandeep was co-founder and CEO of a machine learning startup focussed on finding security vulnerabilities in Cloud Native deployment stacks. Sandeep has nearly two decades of experience in storage and data platforms, and has held various technical leadership roles at VMware and IBM. Over his career, Sandeep has contributed to multiple enterprise products, and holds 35+ issued patents, 20+ conference and journal publications, and regularly blogs on All-things-Enterprise-Data. He has a Ph.D. from University of Illinois at Urbana-Champaign.

    M4
    Weaving Your Enterprise Data Fabric to Harness Machine Learning In-Production
    Sandeep Uttamchandani, Intuit

    Sandeep Uttamchandani is a Distinguished Engineer at Intuit, focussing on platforms for storage, databases, analytics, and machine learning. Prior to Intuit, Sandeep was co-founder and CEO of a machine learning startup focussed on finding security vulnerabilities in Cloud Native deployment stacks. Sandeep has nearly two decades of experience in storage and data platforms, and has held various technical leadership roles at VMware and IBM. Over his career, Sandeep has contributed to multiple enterprise products, and holds 35+ issued patents, 20+ conference and journal publications, and regularly blogs on All-things-Enterprise-Data. He has a Ph.D. from University of Illinois at Urbana-Champaign.

    1:30 pm5:00 pm

    Grand Ballroom EFGH

    Enterprises today have a plethora of information that needs to be harnessed for business insights. Over the years, Enterprises have made investments in a variety of storage solutions, relational databases, warehouses, NoSQL stores, Big Data analytics platforms, Data Lakes, Cloud Stores, etc. As we enter the era of Machine Learning (ML), it is important to understand how to bring these silos together to discover, build, and deploy ML models in production.

    This tutorial covers the technical concepts and architectural models required to operationalize and architect your Enterprise Data Fabric for ML initiatives. The tutorial is divided into the following sections:

    • A Data Engineering perspective on the end-to-end ML workflow in-production
    • Taxonomy of requirements & landscape of available building blocks for the Data Fabric
    • Putting it together: Defining the Data Fabric architecture with reference examples

    The tutorial assumes a basic knowledge of popular Big Data and Analytics solutions. We assume no ML background—our focus will be on operational concepts rather than the internal mathematical formulations of ML algorithms. The tutorial is designed for Storage architects, Data Engineers, and Engineering Managers interested in learning designing of Data Fabrics.

    Topics include:
    • Different architectures for Data Stores (Relational, MPP, NoSQL, Event Stores, In-memory grids, etc.)
    • Different Analytic programming models and Frameworks (Batch, Interactive, Stream)
    • Example Cloud computing platforms for Data Management
    • Workflow for Machine Learning models in production
    • Blue-print of a Data Fabric
    • Examples reference architectures of Data Fabric deployments
    Additional Materials:
    Monday, February 25, 2019

    Half Day Morning

    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public cloud system. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through operating systems, network services, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. While at Panasas he helped build the PanFS cluster file system.  He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    Morning Tutorial 1:
    Understanding Large Scale Storage Systems
    Brent Welch, Google

    Brent Welch is a senior staff software engineer at Google, where he works on their public cloud system. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through operating systems, network services, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. While at Panasas he helped build the PanFS cluster file system.  He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

    9:00 am12:30 pm

    Constitution Ballroom A

    This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

    Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

    The tutorial starts with a look at storage devices including traditional hard drives, SSD, and new non-volatile memory devices. Next, we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

    Topics include:
    • SSD technology
    • NVRAM
    • Scaling the data path
    • Scaling metadata
    • Fault tolerance
    • Manageability
    • Cloud storage
    Additional Materials:

    Half Day Afternoon

    Andy Rudoff, Intel

    Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

    Tom Talpey, Microsoft

    Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

    Afternoon Tutorial 1:
    Advanced Persistent Memory Programming
    Andy Rudoff, Intel

    Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

     
    Tom Talpey, Microsoft

    Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

    2:00 pm5:30 pm

    Constitution Ballroom A

    Persistent Memory (“PM”) support is becoming ubiquitous in today’s operating systems and computing platforms. From Windows to Linux to open source, and from NVDIMM, PCI Express, storage-attached and network-attached interconnect access, it is available broadly across the industry. Its byte-addressability and ultra-low latency, combined with its durability, promise a revolution in storage and applications as they evolve to take advantage of these new platform capabilities.

    The tutorial explores the concepts and today’s programming methodologies for PM, including the SNIA NonVolatile Memory Programming Model architecture, open source and native APIs, operating system support for PM such as direct access filesystems, and via language and compiler approaches. The software PM landscape is already rich and growing.

    Additionally, the tutorial will explore the considerations when PM access is extended across fabrics such as networks, I/O interconnects, and other non-local access. While the programming paradigms remain common, the implications on latency, protocols, and especially error recovery are critically important to both performance and correctness. Understanding these requirements are of interest to both the system and application developer or designer.

    Specific programming examples, fully functional on today’s systems, will be shown and analyzed. Concepts for moving new applications and storage paradigms to PM will be motivated and explored. Application developers, system software developers, and network system designers will all benefit. Anyone interested in an in-depth introduction to PM in emerging software and hardware systems can also expect an illuminating and thought-provoking experience.

    Topics include:
    • Persistent Memory
    • Persistent Memory Technologies
    • Remote Persistent Memory
    • Programming Interfaces
    • Operating Systems
    • Open Source Libraries
    • RDMA
    Additional Materials:

    Irfan Ahmad, CachePhysics

    Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

    Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

    Ymir Vigfusson, Emory University

    Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

    His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

    Afternoon Tutorial 2:
    Caches in the Modern Memory Hierarchy with Persistent Memory and Flash
    Irfan Ahmad, CachePhysics

    Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

    Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

     
    Ymir Vigfusson, Emory University

    Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

    His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

    2:00 pm5:30 pm

    Constitution Ballroom B

    For a very long time, practical scaling of every level in the computing hierarchy has required innovation and improvement in caches. This is as true for CPUs as it is for storage and networked, distributed systems. As such, research into cache efficiency and efficacy improvements has been highly motivated and continues with strong improvements to this day. However, there are certain areas in cache algorithms optimization that have only recently experienced breakthroughs.

    In this tutorial, we will start by reviewing the history of the caching algorithm research and practice in industry. Of particular interest to us are multi-tier memory hierarchies that are getting more complex and deep due to hardware innovations. These hierarchies motivate revisiting multi-tier algorithms. We will then review key tools in the research or and management called cache utility curves and recent literature that has made them easier to compute. Using this tool, we will excavate around caching policies and their trade-offs. We will also spend some time thinking about optimality for caches in modern memory hierarchies with DRAM, non-volatile/persistent memory and flash.

    Topics include:
    • Overview and history of the caching algorithm research and practice in industry
    • Introduction to new challenges posed by multi-tier memory hierarchies
    • Review of Cache utility curves and recent literature
    • Experimenting with caching policies for production uses cases
    • How to find the optimal cache
    Additional Materials: