Full Training Program

Sunday, December 9, 2012

Full Day

Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

Read more about Rik Farrow, Security Consultant

S1

Securing Linux Servers

Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

9:00 am–5:00 pm

Nautilus 3

This tutorial is also available for live streaming.
Find out more Purchase this Session

Linux servers start out very secure: it's what you do with them when you use them that can create insecurities. A straight Linux server install runs minimal services and has few users and a very restrictive firewall, which is a great security posture but is pretty useless for most purposes. As users are added, services enabled, and holes punched through the firewall, the security can deteriorate quickly.

This class will show you how to maintain a strong security posture through careful configuration and proper use of Linux tools and services. Linux contains tools and software that can be enabled to slow brute-force attacks against user accounts, can notice when your accounts have weak passwords or are under attack, can keep services and software up to date, and can sandbox applications to prevent even zero-day attacks. The class will focus on attacks most recently seen, including attacks on mail and Web servers.

Who should attend:

Linux system administrators and security managers familiar with Linux sysem administration, whether you manage a handful or clusters of Linux systems.

Take back to work:

Techniques for securing and maintaining Linux servers.

Topics include:

Minimizing risk with appropriate restrictions
Managing and tracking application vulnerabilities
Sandboxing to prevent attacks
Monitoring logfiles
Updates and configuration management

Additional Materials:

Back to Top

William LeFebvre, CSE

William LeFebvre (S2) has been banging on UNIX systems since 1983, and was first exposed to computer networking with the original ARPANet. He has been teaching at conferences since 1991. William is currently the chief architect for the digital group at Career Sports and Entertainment. He designs and creates production web enviroments, and consults with clients on a wide range of Internet technologies.

Read more about William LeFebvre, CSE

Marc Staveley

Marc Staveley (S2) is an independent consultant where he applies his years of experience with UNIX development and administration helping clients with server consolidation and application migration projects. Previously he held positions at SOMA Networks, Sun Microsystems, NCR, and Princeton University. He is a frequent speaker on the standards-based development, multi-threaded programming, system administration, and performance tuning.

Read more about Marc Staveley

S2

Using Amazon Web Services

William LeFebvre, CSE

William LeFebvre (S2) has been banging on UNIX systems since 1983, and was first exposed to computer networking with the original ARPANet. He has been teaching at conferences since 1991. William is currently the chief architect for the digital group at Career Sports and Entertainment. He designs and creates production web enviroments, and consults with clients on a wide range of Internet technologies.

Marc Staveley

Marc Staveley (S2) is an independent consultant where he applies his years of experience with UNIX development and administration helping clients with server consolidation and application migration projects. Previously he held positions at SOMA Networks, Sun Microsystems, NCR, and Princeton University. He is a frequent speaker on the standards-based development, multi-threaded programming, system administration, and performance tuning.

9:00 am–5:00 pm

Nautilus 2

Amazon offers a solid collection of cloud services through the Amazon Web Services (AWS). These include virtual machines and storage, load balancers, replicated databases, content data delivery, and automatic scaling and monitoring. AWS provides a very rich API to facilitate building applications that utilize these services, but the actual user interfaces can be difficult to master. This tutorial introduces the Amazon Web Services and describes the more popular services and how they can all fit together to support an infrastructure. It provides in-depth instruction on using the user-level interfaces for the more popular services: EC2, EBS, ELB, S3, Cloudfront, and others. Three interfaces will be taught: the Web console, the Firefox plug-in Elasticfox, and the command-line tools. Cautions and pitfalls will be presented along the way to ensure that the student will not make some common mistakes of first-time AWS users.

Who should attend:

System administrators who currently use or are considering the use of Amazon Web Services (AWS), as well as individuals who are tasked with supporting AWS for production services, especially if they are unfamiliar or uncomfortable with the command-line tools and the Web-based interfaces supplied by Amazon. Experience with cloud computing is not required. Experience with the installation and support of basic tools and languages (especially Java and Ruby) would be beneficial. Time will only permit us to study the creation and support of Linux instances in the Amazon cloud.

Take back to work:

Knowledge of the techniques, pitfalls, commands, and programs that will help you make effective use of the Amazon cloud.

Topics include:

Introduction to AWS
Elastic Compute Cloud (EC2)
Elastic Block Store (EBS)
Simple Storage Service (S3)
Elastic Load Balancing (ELB)
Relational Database Service (RDS)
More in-depth topics: accessing EC2 instance data from within the instance, boot-time scripts in common AMIs

Additional Materials:

Back to Top

Lee Damon, University of Washington

Lee Damon (S3) has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, co-chaired CasITconf '11, and is co-chairing CasITconf '13.

Read more about Lee Damon, University of Washington

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

S3

Seven Habits of the Highly Effective System Administrator: Hints, Tricks, Techniques, and Tools of the Trade

Lee Damon, University of Washington

Lee Damon (S3) has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, co-chaired CasITconf '11, and is co-chairing CasITconf '13.

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am–5:00 pm

Nautilus 5

We aim to accelerate the experience curve for junior system administrators by teaching them the tricks (and effective coping strategies) that experienced administrators take for granted and which are necessary for successful growth of both the administrator and the site.

The class covers many of the best practices that senior administrators have long incorporated in their work. We will touch on tools you should use, as well as tools you should try to avoid. We will touch on things that come up frequently, as well as those which happen only once or twice a year. We will look at a basic security approach.

Who should attend:

Junior system administrators with anywhere from little to 3+ years of experience in computer system administration. We will focus on enabling the junior system administrator to "do it right the first time." Some topics will use UNIX-specific tools as examples, but the class is applicable to any sysadmin and any OS. Most of the material covered is "the other 90%" of system administration—things every sysadmin needs to do and to know, but which aren't details of specific technical implementation.

Take back to work:

Ideas about how to improve and to streamline your systems and your workload, and, just as important, where to look to find more answers.

Topics include:

The five things every site should know
Why your computers should all agree on what time it is
Why root passwords should not be the same on every computer
Why backing up every file system on every computer is not always a good idea
Policies—where you want them and where you might want to avoid them
Ethical issues
Growth and success as a solo-sysadmin as well as in small, medium, and large teams
Training, mentoring, and personal growth planning
Site planning and roadmaps
Budgeting
Logistics
Books that can help you and your users

Additional Materials:

Back to Top

Half Day Morning

John Arrasjid, VMware

John Arrasjid (S4) is a Principal Architect at VMware, specializing in Enterprise Private Cloud (vCloud), business continuity, disaster recovery, and virtual datacenter architecture design, and is the lead architect for the vCloud Architecture Design Toolkit (vCAT). He has over 20 years of IT and consulting experience. John regularly presents at conferences (VMworld, VMware Partner Exchange, USENIX Annual Technical Conference, and USENIX LISA). John is currently serving as Vice President on the USENIX Board of Directors and has published three books in the SAGE Short Topics Series: Cloud Computing with VMware vCloud Director, Foundation for Cloud Computing with VMware vSphere 4, and Deploying the VMware Infrastructure. John is a founding member of VMware bands Elastic Sky and The Hypervisors. John holds both VCP and VCDX (#001) certifications. John has a Bachelor of Science in Computer Science from SUNY Buffalo, NY.

Read more about John Arrasjid, VMware

Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

Read more about Ben Lin, VMware

S4

VMware vCloud Concepts, Technology, and Operations

New!

John Arrasjid, VMware

John Arrasjid (S4) is a Principal Architect at VMware, specializing in Enterprise Private Cloud (vCloud), business continuity, disaster recovery, and virtual datacenter architecture design, and is the lead architect for the vCloud Architecture Design Toolkit (vCAT). He has over 20 years of IT and consulting experience. John regularly presents at conferences (VMworld, VMware Partner Exchange, USENIX Annual Technical Conference, and USENIX LISA). John is currently serving as Vice President on the USENIX Board of Directors and has published three books in the SAGE Short Topics Series: Cloud Computing with VMware vCloud Director, Foundation for Cloud Computing with VMware vSphere 4, and Deploying the VMware Infrastructure. John is a founding member of VMware bands Elastic Sky and The Hypervisors. John holds both VCP and VCDX (#001) certifications. John has a Bachelor of Science in Computer Science from SUNY Buffalo, NY.

Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

9:00 am–12:30 pm

Marina 6

This class will focus on the latest vCloud technologies. An overview of the technology, design, implementation, and management will be covered concisely, and demonstrations of various aspects will be given.

This is a vendor-specific class. VMware-specific details will be covered, but concepts may be applicable to other cloud computing technologies. This session will cover the VMware perspective on implementing a cloud solution, the technologies used, third-party integration considerations, and operational concepts.

Who should attend:

Consumers, system administrators, cloud operators, and cloud architects who are interested in learning about the VMware vCloud solution, which is based on NIST guidelines. Experience with VMware vSphere, VMware Chargeback, and Distributed Virtual Switches is preferred but not required.

Take back to work:

How VMware vCloud implements cloud functionality based on the NIST guidelines
The basics of developing a cloud solution with VMware
An understanding of the concepts and technology used in a vCloud-based cloud computing infrastructure
How to utilize the VMware vCloud Architecture Toolkit (vCAT) for learning, architecting, operating, and consuming a VMware vCloud

Topics include:

VMware vCloud core concepts and features
The VMware suite of technologies used to stand up cloud computing environments, including public/private/hybrid and on/off premises
VMware vCloud mechanisms for multi-tenancy, resource elasticity, segmentation of resources, and provisioning mechanisms through the use of VMware vCloud Director, vShield, vCenter Chargeback, vCloud Connector, and other technologies
Sections of the vCAT materials and how they are used by architects, operators, and consumers of a vCloud-based deployment
Demonstration of features

Additional Materials:

Back to Top

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Read more about David Nalley, Apache CloudStack/Fedora

S5

RPM Packaging for Sysadmins

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

9:00 am–12:30 pm

Nautilus 4

Packaging software is a must for consistent and automated system provisioning and maintenance, yet very few people do so. We will explore why sysadmins should package software, what benefits accrue to software packagers, and how to package your software.

Who should attend:

System adminstrators with a modicum of experience who have an interest in further automating their environments, and advanced sysadmins who have little or no experience with packaging.

Take back to work:

An understanding of the benefits of packaging and the ability to begin packaging immediately.

Topics include:

Automation
Consistency
Packaging's serendipitous benefits
How to package software
Tools to make packaging easier

Additional Materials:

Back to Top

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

Read more about Geoff Halprin, The SysAdmin Group

S6

Agile Software Development: Getting It Out the Door Successfully

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

9:00 am–12:30 pm

Nautilus 1

The days of monolithic software development are gone. Agile (a combination of ideas from extreme programming and lean manufacturing) is now the normal practice, even amongst the largest corporations. Agile flips traditional software development on its head, moving everything into small time-boxed iterations of a few weeks. Imagine shipping software every few weeks!

Who should attend:

Programmers who wish to learn a better way to deliver software; programmers who have heard of agile, but don’t understand what the fuss is all about; sysadmins who wish to learn a simple set of techniques to help them maintain their scripting and infrastructure automation.

Take back to work:

An understanding of all of the key concepts for agile software development; an immediate ability to apply these concepts and improve your delivery.

Topics include:

The history of agile development
Key concepts: iterations, stories, planning, showcases, retrospectives, Kanban, and visible workflow
How they all come together into a cohesive, radically simpler way to develop software

Additional Materials:

Back to Top

Half Day Afternoon

Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

Read more about Ben Lin, VMware

David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

Read more about David Hill, VMware

S7

vCloud Architecture Design with vCAT (vCloud Architecture Toolkit)

New!

Ben Lin, VMware

Ben Lin (S4, S7) is a vCloud Solutions Specialist for the VMware Global Accounts team. He has been closely involved with vCloud solutions, having performed some of the first designs and deployments worldwide. Ben holds VCDX3/4 certification and is an author of Cloud Computing with VMware vCloud Director. Ben was a document lead for vCAT and helped create the vCloud Accelerator service kit used by field and partners worldwide. He has presented at the VMworld 2011, VMworld Europe 2011, Partner Exchange 2012, Partner Exchange 2011, HotCloud '11, LISA '11, and LISA '10 conferences.

David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

1:30 pm–5:00 pm

Marina 6

This is a vendor-specific class. VMware vCloud is a suite of VMware technologies used to stand up cloud computing environments, including public/private/hybrid and on/off premises. VMware vCloud provides multi-tenancy, resource elasticity, segmentation of resources, and provisioning mechanisms through the use of VMware vCloud Director, vShield, vCenter Chargeback, vCloud Connector, and other technologies. Experience with VMware vSphere, VMware Chargeback, and Distributed Virtual Switches is preferred but not required. It is recommended, but not required, that attendees have previous knowledge of vCloud or have attended "VMware vCloud Concepts, Technology, and Operations."

Who should attend:

Cloud architects and cloud operators who are interested in deploying a VMware vCloud.

Take back to work:

The knowledge needed to deploy a VMware Cloud for use as an enterprise private cloud.

Topics include:

VMware vCloud core concepts and features
Architecture design considerations
vCloud architecture design patterns and best practices
Demonstration of features

Additional Materials:

Back to Top

Nuri Halperin, 10gen

Read more about Nuri Halperin, 10gen

S8

MongoDB: NoSQL Operations Hands On

New!

Nuri Halperin, 10gen

1:30 pm–5:00 pm

Nautilus 4

MongoDB is an open-source, document-oriented, NoSQL database designed with both scalability and agility in mind. The goal of MongoDB is to bridge the gap between key-value stores (which are fast and scalable) and relational databases (which have rich functionality).

Attendees will work through several model operational scenarios, covering both planned and unplanned maintenance tasks, backups and recovery processes, responding to database growth requirements, and more!

Who should attend:

Anyone interested in learning about operating a MongoDB deployment.

Take back to work:

What you need to know to successfully deploy and maintain a MongoDB database, diagnose performance issues, import and export data from MongoDB, and establish the proper backup and restore routines.

Topics include:

Setting up replica sets
Migrating servers to different hosts
Repairing a deployment after hardware failure
Scaling out with auto-sharding

Additional Materials:

Back to Top

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

Read more about Geoff Halprin, The SysAdmin Group

S9

Introduction to Provisioning

New!

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

1:30 pm–5:00 pm

Nautilus 1

As sysadmins, provisioning is one of the most obvious, basic, and important things we do. This simple topic covers areas as diverse as establishing the right standards, automated delivery (where applicable), ensuring software license compliance, and, oh, actually building the machine (physical or virtual).

For such an obvious part of our duties, it is surprising just how much thought goes into effective provisioning. This tutorial will cover a wide gamut of issues, many of which must be performed long before the machine is ever built.

Who should attend:

Sysadmins, IT/IS managers, and any other folks responsible for new systems (real or virtual), installations, and integrations; sysadmins looking towards the cloud, infrastructure automation, and more maintainable systems.

Take back to work:

A deep understanding of the provisioning process and its context in the wider infrastructure lifecycle; processes and best practices for efficient and timely roll-out and integration of systems.

Topics include:

The infrastructure lifecycle
The three parts to provisioning: planning, pre-provisioning, and provisioning
Automation
The MUST haves (Mandatory Ubiquitous Service Transport)
Management traffic: in-band, out-of-band, and image distribution
The build (physical and virtual)
Software licensing and other topics you probably didn’t think of

Additional Materials:

Back to Top

Monday, December 10, 2012

Full Day

Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

Read more about Rik Farrow, Security Consultant

M1

SELinux (Security-Enhanced Linux)

Rik Farrow, Security Consultant

Rik Farrow (S1, M1) began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

9:00 am–5:00 pm

Nautilus 4

SELinux provides an extra layer of security for most Linux systems—if you leave it enabled. Most commonly, SELinux gets disabled as the first step when debugging system problems, even when it is not the problem. SELinux can stop many attacks, even previously unknown (zero-day) attacks, as it confines applications' access to files, directories, commands, and network sockets.

This class will show you how to work with SELinux: how to determine if SELinux is blocking an application and how to adjust policy to move beyond problems. SELinux includes many tools for viewing audit logs, file and process contexts, modifying policy, and even interpreting log messages, and you will learn how to use these tools. You will learn how to modify file contexts, add new policy, monitor logs both graphically and in text-only mode, and, most importantly, how to recover full SELinux coverage on systems where it has been disabled. The class will cover reading and modifying existing policy where necessary, so that changes to services, such as non-standard directory locations, are accommodated. The class will also investigate adding new, custom services to SELinux policy.

This class includes exercises that will be performed using a provided VM.

Who should attend:

Linux system administrators and security managers who want or are required to use SELinux. Participants must be familiar with Linux system administration; previous frustration with SELinux is expected but not required.

Take back to work:

The ability to run Linux servers and desktops with SELinux enabled and to modify policy to handle configurations not supported by the default policy.

Topics include:

SELinux uncloaked
- Types, contexts, and roles
- Context-based policy
- Extensions to familiar commands
- Using the sandbox command
Using the audit file
- Tools for deciphering audit messages
- Searching audit messages
- Using setroubleshoot
Adjusting file/directory context
- Fixing common access problems
Using Booleans to adjust policy
Extending policy
- Using audit2allow to correct policy
- Using sepolgen to create new policies
- Restricting users
- SELinux rule syntax
- Understanding and using macros

Additional Materials:

Back to Top

Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

Read more about Shumon Huque, University of Pennsylvania

M2

Using and Migrating to IPv6

Updated!

Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

9:00 am–5:00 pm

Nautilus 3

The Internet is facing an imminent exhaustion of IP addresses. IPv6, the next-generation Internet Protocol, is designed to address this problem, among other things. If you have not yet started to deploy IPv6, now is the time. The "World IPv6 Launch" event took place in June 2012, and major industry players such as Google, Facebook, Comcast, Yahoo!, Akamai, and Cisco already support IPv6, with many others coming on board in the near future. You need to be next!

This tutorial will provide a detailed introduction to IPv6 and will also cover practical aspects of configuring and using IPv6 in networks, operating systems, and applications. Basic knowledge of IPv4 is assumed, but no prior knowledge of IPv6 is necessary. The course will go over the basics of IPv6 and dive into the details you need.

Who should attend:

System administrators, network administrators, and application developers who need to prepare for migration to IPv6, and anyone who wants a general introduction to IPv6 and what is involved in deploying it.

Take back to work:

An understanding of IPv6, with the basic knowledge to begin designing and deploying IPv6 networks, systems, and applications.

Topics include:

The current IP address landscape and the motivation for IPv6
IPv6 addresses and protocol details
DHCPv6
DNS
Tunneling
Configuring hosts and application services
IPv6-related security topics

Time permitting, and depending on audience interests, further topics can be covered, such as configuring IPv6 routing or programming examples.

Additional Materials:

Back to Top

Half Day Morning

Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

Read more about Mahesh Rajani, VMware

Wade Holmes, VMware

Read more about Wade Holmes, VMware

M3

vCloud Advanced Topics 1: Compliance, Security, and Networking with VMware Network and Security Technologies

New!

Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

Wade Holmes, VMware

9:00 am–12:30 pm

Marina 2

Networking and security details will be provided about VXLAN and vShield technologies for use in a vCloud infrastructure design. Some concepts will also be applicable to a vSphere deployment and will thus be of value to both vSphere- and vCloud-skilled attendees.

We present how to utilize the VMware vCloud Architecture Toolkit (vCAT) for architecture design and operations of networking and security in a vCloud infrastructure.

Attendees will learn design and operational concepts and guidelines for addressing compliance within a vCloud through the use of VMware technologies for networking and security.

Who should attend:

Cloud architects and cloud operators who are interested in the networking and security components tied to design and operation of a vCloud infrastructure. Prior knowledge of vCloud or attendance at Sunday's vCloud classes is strongly recommended but not required.

Take back to work:

Best practices and guidelines from vCAT to support business requirements for compliance and connectivity.

Topics include:

vCloud networking, including advanced topics such as vShield and VXLAN
vCloud security considerations for design and operations
Demonstration of features

Additional Materials:

Back to Top

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

M4

Core Skills: Scripting for Automation

New!

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am–12:30 pm

Nautilus 5

Automation is critical to surviving your system administration career with your sanity, hair, and systems intact. If you can automate some or all of a task, then you stand to make considerable gains in personal productivity, task repeatability, and system predictability.

So how can you achieve this state of Nirvana? The answer is, by scripting.

This class is a practical crash course in how, using a combination of bash, Perl, and friends, you can write useful scripts that solve real-world system administration problems.

Please note that this is a hands-on class. A basic understanding of programming ("What's a loop?") and how to edit files in your favorite flavor of U*X are assumed. Attendees will need to bring a laptop with OS X, Linux, or FreeBSD installed to complete in-class tasks.

if there is sufficient interest, we will schedule a BoF for scripting challenges, where we can work them out interactively as a group.

Who should attend:

Junior and intermediate sysadmins who are new to scripting or would like to create scripts to reliably automate sysadmin tasks.

Take back to work:

An understanding of how to apply standard utilities in your scripts, along with recipes for automating typical administration tasks.

Topics include:

Controlling programs and processes
Script building blocks
Searching
Working with data
Pipes and performance
Tying them all together to write effective scripts
When scripts might not be the best choice

Additional Materials:

Back to Top

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

Read more about Thomas A. Limoncelli, Google, Inc.

M5

Introduction to Time Management for System Administrators

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

9:00 am–12:30 pm

Nautilus 2

This tutorial is also available for live streaming.
Find out more Purchase this Session

If you agree with any of these statements, this class is for you:

I don't have enough time to get all my work done.
As a sysadmin, I can't schedule, prioritize, or plan my work.
I'm spending all my time mopping the floor; I don't have time to fix the leaking pipe.
My boss says I don't work hard enough, but I'm always working my ____ off!

Who should attend:

Sysadmins and developers who need more time in their day, who have problems getting projects done because of constant interruptions, or who want more control over their time and the ability to schedule work instead of working at the whim of their users.

Take back to work:

The skills you need to get more done in less time.

Topics include:

Introduction
- Why typical "time management" books don't work for sysadmins
- What makes "to-do" lists fail, and how to make them work
- How to eliminate "I forgot" from your vocabulary
Managing interruptions
- Preventing them from getting to you
- Managing the ones you get
- Sharing the load with co-workers
To-do lists
- Achieving perfect follow-through
- The Cycle System for recording and processing to-do lists
- Prioritization techniques
- Scheduling your work (for a sysadmin? really?)
Task grouping: Batching, sharding, and multitasking
Handling the most difficult days
- The day before a vacation
- The day a big outage disrupts your perfectly planned day

Additional Materials:

Back to Top

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Read more about David Nalley, Apache CloudStack/Fedora

M6

Building a Massively Scalable, Geographically Disparate IaaS Cloud

New!

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

9:00 am–12:30 pm

Nautilus 1

In this half-day class we'll give system administrators an understanding of IaaS clouds, with the reference implementation being Apache CloudStack. We'll cover everything from initial configuration to scaling and maintaining availability.

Who should attend:

Intermediate to advanced sysadmins or enterprise architects wanting to deploy a production Infrastructure as a Service cloud. Experience with virtualization platforms and a deep understanding of L2/L3 networking are preferred but not required.

Take back to work:

What you need to deploy an IaaS cloud, based on Apache CloudStack, in an enterprise environment.

Topics include:

Deploying Apache CloudStack
Next-generation network topologies, including SDN
Scaling storage without becoming indentured to SAN vendors
Making CloudStack scale to tens of thousands of physical nodes
Maintaining availability in a "failure is assured" environment

Additional Materials:

Back to Top

Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

Read more about Joshua Jensen, Cisco Systems

M7

Linux Network Service Administration

New!

Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

9:00 am–12:30 pm

Marina 6

System administrators are being tasked with bringing their office environments online, whether that's one stand-alone client attached to the Internet or a distributed network of Web servers. The network services that need to be configured in order to do this can be daunting to administrators who aren't familiar with the required applications. We will cover configuration examples, as well as overviews of the underlying protocols, that attendees can take home and put to work immediately.

The tutorial will be conducted in an open manner that encourages question-and-answer interruptions.

Who should attend:

System administrators who are implementing network services and are looking for a background in the configuration of those services as well as basics of the protocols. Attendees should have some network client/server experience and a basic knowledge of UNIX administration, but they do not need to be experienced network administrators. Both new and intermediate network administrators will leave the tutorial having learned something.

Take back to work:

Confidence in the ability to set up and maintain secure network services.

Topics include:

Overview
Network services
- SSH: Secure shell with OpenSSH
- FTP: Explore vsftpd
- HTTP: Apache and Tux and Squid
- SMTP: Postfix MTA
- NFS: Network File Systems
- LDAP: Global authentication with OpenLDAP
- DHCP: DHCPD and PXE
- DNS: ISC's BIND
- NTP: Network Time
- LPD: Printing with cups
Host-based security with TCP Wrappers and Xinetd
Overview of Linux packet filtering
Network monitoring and logging
Network utilities you should be using

Additional Materials:

Back to Top

Half Day Afternoon

David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

Read more about David Hill, VMware

Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

Read more about Mahesh Rajani, VMware

M8

vCloud Advanced Topics 2: Availability, Business Continuity, and Disaster Recovery

New!

David Hill, VMware

David Hill (S7, M8) is a Senior Solutions Architect working at VMware, specializing in cloud computing, disaster recovery, and virtualization. He is a VMware Certified Advanced Professional (VCAP) and a VMware Certified Professional (VCP). David is a lead architect for the vCloud Architecture Toolkit. Before joining VMware, he was a self-employed IT consultant and architect for about fifteen years, working on projects for large consultancies and financial institutions.

Mahesh Rajani, VMware

Mahesh Rajani (M3, M8) is a Consulting Architect in the CoE group at VMware. He has been in the IT industry for over fifteen years, serving in various roles from developer through system administrator to cloud architect. Mahesh is a lead architect for the vCloud Architecture Design Toolkit. He has a Master's degree in electrical engineering from Texas A&M. He is also a VMware Certified Design Expert (VCDX-34).

1:30 pm–5:00 pm

Marina 2

Learn about availability techniques for vCloud infrastructure and workloads. This class will cover vCloud availability design aspects for site failure, with VMware Site Recovery Manager and storage replication adapters, and for workload failure, utilizing third-party backup/restore technologies.

Who should attend:

Cloud architects and cloud operators who are interested in availability, business continuity, and disaster recovery for both the vCloud infrastructure and the deployed workloads. Prior knowledge of vCloud or attendance at the three previous VCloud classes at LISA '12 is recommended but not required.

Take back to work:

Best practices and guidelines from the VMware vCloud Architecture Toolkit (vCAT) to support business requirements for compliance and connectivity.

Topics include:

Backup and recovery of vCloud infrastructure and workloads
Site failover considerations and options for vCloud infrastructure and workloads
Use of VMware HA, DRS, FT, and other technologies
Demonstrations where appropriate, if time permits

Additional Materials:

Back to Top

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

M9

Advanced Shell Programming

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm–5:00 pm

Nautilus 5

The humble shell script is still a mainstay of UNIX/Linux system administration scripting languages, despite the wide availability of alternatives such as Perl, Python, TCL, and other similar languages. This class details techniques that move beyond the quick-and-dirty shell script.

Who should attend:

Intermediate system administrators or anyone with a solid knowledge of programming and with some experience in Bourne/Korn shells or their derivatives.

Take back to work:

An understanding of how to use the "lowly" shell to achieve lofty goals.

Topics include:

Common mistakes and unsafe practices
Modular shell script programming
Building blocks: awk, sed, etc.
Writing secure shell scripts
Performance tuning
Choosing the right utilities for the job
Addressing portability at the design stage
When not to use shell scripts

Additional Materials:

Back to Top

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

Read more about Thomas A. Limoncelli, Google, Inc.

M10

Advanced Time Management: Team Efficiency

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

1:30 pm–5:00 pm

Nautilus 2

This tutorial is also available for live streaming.
Find out more Purchase this Session

Attendees must bring a laptop for in-class exercises. Techniques will apply to any wiki or collaborative document system, but the labs and demos will be done using Google Apps.

Who should attend:

All sysadmins who want to collaborate efficiently within their team and with others (even solo sysadmins will benefit!).

Take back to work:

Techniques to help your IT team work better, faster, and more transparently.

Topics include:

Meetings and email
- Making meetings not suck
- Handling meetings that can't be fixed
- Stopping incoming email overload
- Making sure your email gets read
- How to get your co-workers to go along with your awesome ideas
Working better together using collaborative documents
- Buy vs. build: How to get a team to agree
- Common sysadmin uses of collaborative documents
- Uncommon sysadmin uses of collaborative documents
- Communicating a new design before you build it
- Tracking loaner resources
- Doing surveys
- Assuring consistent results no matter who does the task
- Making sure everyone on the team can share the work
- Quick and easy way to document each service (and why you should)
- Quick and easy way to document procedures (so others can do them for you)
- Pager-duty tips for creating a feedback loop to assure constant improvement
- Template for a simple IT department home page

Additional Materials:

Back to Top

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Read more about David Nalley, Apache CloudStack/Fedora

M11

Monitoring Servers, Networks, and Lunchrooms with Zenoss

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

1:30 pm–5:00 pm

Nautilus 1

To quote Tom Limoncelli, "It's not a service if you aren't monitoring it." A decade ago, monitoring effectively meant either cobbled together, home-grown scripts or massive and inflexible enterprise applications. In the intervening time, monitoring has become a must-have for even the smallest environment, the hodgepodge collection of scripts has grown to become unmaintainable, and the massive enterprise applications, while working well, are slow to respond to the changes happening everywhere. In that interim, monitoring with open source software has effectively become the de facto standard, because, like the browser and operating system, monitoring is now a commodity.

Zenoss is free/libre open source software for monitoring applications, networks, servers, and even whether the restroom is in use.

Who should attend:

Sysadmins and managers who are planning to use or are evaluating Zenoss as a monitoring platform, and those who are nascent in their exploration of systems/network monitoring. Sysadmins experiencing scaling/scope issues with other tools such as RRDtool or Nagios will walk away learning much. Participants are expected to be relatively well versed in operating system and application mechanics.

Take back to work:

A good grasp of the basics of Zenoss and monitoring theory and the ability to put this information to use immediately, along with a rudimentary understanding of some of the more esoteric features Zenoss offers.

Topics include:

Monitoring theory
- Status monitoring
- Performance monitoring
- Predictive monitoring
Overview of Zenoss capabilities
Overview of Zenoss installation
Methods to jumpstart monitoring
- Auto discovery
- Templating and inheritance
Deep inspection of monitoring capabilities
- SNMP
- WMI
- Network Service Checks
- Nagios
- Syslog
- WBEM
- Esoteric things
Dealing with the information/alerts
- Reporting
- Alerts
Taking your monitoring to the next level
- Automated dependency checking
- Integration with configuration management systems
- ZenPacks—monitoring everything
- Event transforms

Additional Materials:

Back to Top

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

Read more about Geoff Halprin, The SysAdmin Group

M12

The First Hundred Days

New!

Geoff Halprin, The SysAdmin Group

Geoff Halprin (S6, S9, M12) has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the SAGE Short Topics book A System Administrator's Guide to Auditing and was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

1:30 pm–5:00 pm

Marina 6

Starting at a new company gives you exactly one chance to do things differently, to establish a new approach, realign the team's efforts, and change the tenor of the discussion. To be effective in your new role as leader, you must establish rapport with the other departments and with your team, and take control of the work pipeline.

You've made the leap. You're about to start at a new company, maybe as a senior sysadmin, maybe as a team leader or manager. Now you're asking yourself, "What do I do first?"

Moving to a different company or being promoted internally gives us a unique opportunity to put aside the history of a site and take a new look with a clean set of eyes. If you're hired as the new team lead or manager, then knowing how you're going to get to know the new site and how you're going to get on top of any site consolidation activities is critical to your longevity in the role.

This class discusses the various aspects of moving from a tactical (bottom-up) view of system administration to a strategic (top-down) view. We cover the initial site survey, the first steps of identifying and cauterizing open wounds, and the process of systemic review and improvement.

Who should attend:

Anyone starting or contemplating a new position, including making an in-house move that enables you to start over; anyone with a new boss who wants to understand and help that boss; anyone about to apply for a senior position who wants to take control of the interview process.

Take back to work:

A set of tools and perspectives that will help you evaluate an environment, from company structure and pain points to IT systems and team skills, and help you engage a team in improvements to the department.

Topics include:

Part 1: A topical view
- Organizational awareness: The boss, the company
- The team: Assessing the team; first repairs
- Building the work pipeline; second repairs
- Systems and processes: Workflow management, change management, event management
- Round 1: Cauterizing the wound
- Round 2: Some early wins
- Round 3: The Big Three
- Systemic improvement programs
Part 2: A temporal view
- The first day
- The first week
- The first month
- The first hundred days

Additional Materials:

Back to Top

Tuesday, December 11, 2012

Full Day

Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

Read more about Joshua Jensen, Cisco Systems

T1

High-Availability Linux Clustering with ricci and luci

Joshua Jensen, Cisco Systems

Joshua Jensen (M7, T1) was until recently Cisco Systems' Lead Linux IT Engineer, but is now a one-man IT shop for an autonomous project within Cisco focused on world domination. He has worked as an IBM consultant and was Red Hat's first instructor, examiner, and RHCE. Working with Linux for the past 15 years and for Red Hat for 4 1/2 years, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

9:00 am–5:00 pm

Marina 3

We are evolving from a single service running on a single server, complete with multiple points of hardware and software failure...but to what? With services and platforms that need to be restored before our pagers or bat-phones ring, system administrators of today have a need for high availability. Starting with the current realities of modern data centers, this full-day tutorial will explore practical uses of Linux clusters.

Who should attend:

Linux administrators who are planning on implementing a multi-service fail-over cluster implementation in a production environment. Course attendees should be familiar with the basics of system administration in a Linux environment. At no point will the word "cloud" be used, although novice administrators and gurus alike should leave the tutorial having learned something.

Take back to work:

The knowledge and ability to create and administer highly available services and filesystems on a Linux cluster.

Topics include:

Linux HA Cluster technology: Corosync, OpenAIS, rgmanager, Conga
Data management with shared disk implementations: SAN, iSCSI, AoE, FCoE
Node fencing with STONITH
Network power switches and IPMI
Clustered logical volume management
GFS2 filesystems with Distributed Lock Manager (DLM)
Service management with failover domains
Virtual machines as a cluster service
Cluster administration with luci
Working with cluster-unaware services

Additional Materials:

Back to Top

Half Day Morning

Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

Read more about Shumon Huque, University of Pennsylvania

T2

DNS and DNSSEC

New!

Shumon Huque, University of Pennsylvania

Shumon Huque (M2, T2) is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade. Web site: http://www.huque.com/~shuque/.

9:00 am–12:30 pm

Nautilus 3

This class will give system administrators an understanding of the DNS protocol, including advanced topics such as DNS security. It will provide practical information about configuring DNS services, using examples from the popular ISC BIND DNS software platform.

Who should attend:

Sysadmins and network engineers who are tasked with providing DNS services, as well as anyone interested in knowing more about how the DNS works.

Take back to work:

An understanding of DNS and DNSSEC, with the basic knowledge necessary to design and deploy DNS services.

Topics include:

The DNS protocol and how it works
DNS master zone file format
Server configurations and recommendations
DNSSEC (DNS Security Extensions) and how to deploy it
Many examples of DNS query and debugging using the "dig" tool
DNS and IPv6

Additional Materials:

Back to Top

Steven Murawski, Edgenet

Steven Murawski (T3) is the Senior Windows System Engineer for Edgenet, a data services company, and a Microsoft MVP in PowerShell. In this role, he supports a dynamic infrastructure that pushes the boundaries of the Windows platform. Steven blogs at UsePowerShell.com (http://blog.usepowershell.com). Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community (http://gmitpuc.com) and the Greater Milwaukee Script Club (http://blog.usepowershell.com). He speaks regularly to local user groups and can be found at various conferences.

Read more about Steven Murawski, Edgenet

T3

PowerShell Fundamentals

New!

Steven Murawski, Edgenet

Steven Murawski (T3) is the Senior Windows System Engineer for Edgenet, a data services company, and a Microsoft MVP in PowerShell. In this role, he supports a dynamic infrastructure that pushes the boundaries of the Windows platform. Steven blogs at UsePowerShell.com (http://blog.usepowershell.com). Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community (http://gmitpuc.com) and the Greater Milwaukee Script Club (http://blog.usepowershell.com). He speaks regularly to local user groups and can be found at various conferences.

9:00 am–12:30 pm

Nautilus 4

This tutorial is also available for live streaming.
Find out more Purchase this Session

Do you find yourself repeating a task often? Do you have to retrieve information or change settings on a number of servers or users at once or regularly? Do you find clicking repetitively just too slow to keep up?

If you answered any of these questions with a "Yes," don't miss this half-day class. We will cover a number of ways to make you more productive, in less time—and it is far easier than you may think possible.

Who should attend:

System administrators and anyone else who wants to be more productive on the Microsoft Windows platform.

Take back to work:

Usable commands and patterns to make attendees more effective in working with the Windows platform, along with familiarity with the discovery patterns in PowerShell, so that they can continue to develop their skills.

Topics include:

Introduction to PowerShell
- Finding the commands you need
- What's in the help files
- Discovering hidden gems in command output
Working from the shell
- Navigating the file system, registry, and more
- Working with objects (everything in PowerShell is an object)
Working with servers and workstations
- Discovering WMI
- Working with text
- Dealing with remote machines via WMI and PowerShell Remoting
PowerShell Version 2 and the upcoming release of Version 3

Additional Materials:

Back to Top

Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

Read more about Jacob Farmer, Cambridge Computer Services

T4

Next-Generation Storage Networking: Beyond Conventional SAN and NAS

Updated!

Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

9:00 am–12:30 pm

Marina 2

There has been tremendous innovation in the data storage industry over the past few years. New storage architectures have come to market to challenge traditional SAN and NAS products with nimble new designs that are much better suited to serving the increasingly virtual nature of applications and server infrastructure. Meanwhile, the allure of cloud computing and the emergence of affordable enterprise-class solid state storage devices have inspired ever more innovative approaches to storage caching, tiering, and deduplication. This lecture is a survey of the latest trends and advances in the data storage industry. We trace the I/O path from application to storage media and look at a wide variety of solutions to the ever-changing challenges of data storage.

Who should attend:

System administrators running day-to-day operations, enterprise architects, storage administrators. This tutorial is technical in nature, but it does not address command-line syntax or the operation of specific products or technologies. Rather, the focus is on general architectures different ways to tackle various storage management challenges.

Take back to work:

A better understanding of modern storage architectures, various approaches to scaling in both performance and capacity, and a framework for comparing and contrasting various types of storage solutions.

Topics include:

The storage I/O path and the fundamentals of storage virtualization
Application acceleration with solid state storage devices (SSDs)
Automated tiered storage and information life cycle management (ILM)
Deduplication of primary storage
Object storage models and content-addressable storage
Leveraging the cloud for primary storage

Additional Materials:

Back to Top

Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

Read more about Mark Burgess, CFEngine, Inc.

Carolyn Rowland

Carolyn Rowland (T5) began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. Carolyn will continue as co-chair of WiAC in 2013.

Read more about Carolyn Rowland

T5

A Sysadmin's Guide to Navigating the Business World

Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

Carolyn Rowland

Carolyn Rowland (T5) began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. Carolyn will continue as co-chair of WiAC in 2013.

9:00 am–12:30 pm

Nautilus 5

As a system administrator, you are already a crack technical analyst or engineer, but does your management understand the value and importance of your work? When you ask for staff or funding to support new or existing efforts, does management readily support you? Does management look to you to develop the future of IT services in your organization? Do you feel you have enough time to focus on projects and innovation instead of fire-fighting?

If you answered no to several of the above questions, then this course is for you. The first step toward improving your professional quality of life is to create a positive and collaborative relationship with your management. So why not take responsibility for the relationship? Senior management makes the decisions about budget, staffing, and, often, new services, without understanding the full impact of their decisions. What if you could make yourself part of that process? What if it was easy to ask your management for more resources because they already believed strongly in your positive contribution to the organization?

System administrators often have all the responsibility for IT systems, but none of the control. In this course we bring the two sides together by teaching the system administrator how to communicate effectively so that management will listen and understand. You can use the tactics presented here to increase your value in the organization and improve your marketability.

Make business look good, and you become an important asset to your organization. Your management will appreciate these skills because you will be demonstrating your value in ways they understand and that empower them to make smart IT investment decisions. In turn, your professional credibility increases, putting you in a position to influence decisions impacting your role in the organization.

Who should attend:

IT people and sysadmins interested in taking their career to the next level, improving their relationship with senior management, and increasing their value and marketability.

Take back to work:

Skills to help you develop a productive relationship with your management.

Topics include:

How to approach management to ask for resources you need
Empowering management to make good IT decisions
Demonstrating the value of your work in a way that management will understand
Convincing management of the importance of time to innovate (R&D)
Reducing time spent fire-fighting (efficiencies and cost savings)
Growing organizational loyalty for your team
How to build the perception that you are customer-focused and mission-oriented
Ways to communicate the benefits of supporting a strong IT presence
- Increased organizational competitiveness
- Increased employee productivity
- Cost avoidance and efficiencies
- Risk management
- Knowledge and information management
How to develop a collaborative relationship with your management that enables both sides to be successful

Additional Materials:

Back to Top

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Read more about David Nalley, Apache CloudStack/Fedora

Chiradeep Vittal, Citrix Systems

Read more about Chiradeep Vittal, Citrix Systems

T6

Networking in the Cloud Age

New!

David Nalley, Apache CloudStack/Fedora

David Nalley (S5, M6, M11, T6) is a recovering systems administrator of 10 years and acted as a consultant for an additional 3 years. David is a committer on the Apache CloudStack project. In the Fedora Project, David is the maintainer of a few dozen software packages, is a sponsor for new packagers in Fedora, and is currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and speaks at IT and F/LOSS conferences.

Chiradeep Vittal, Citrix Systems

9:00 am–12:30 pm

Nautilus 1

Networking has been relatively static for decades. We've seen increases in speed, but many of the traditional topologies are inherently limited. Innovative networks are quite different. If you look at public services, such as AWS, or large private cloud deployments, you see that their networking topology looks contrary to everything that's been standard for years. In this half-day class we'll reexamine what limitations there are and what innovative options exist to remove those limitations.

Who should attend:

Advanced system or network admins with a deep understanding of L2/L3 networking who want to learn about new networking technologies that are enabling scaling networks.

Take back to work:

Knowledge of emerging networking standards and where they are best used.

Topics include:

How massive public and private clouds build their networks to ensure scalability
How software defined networks work
Technologies worth looking at: VXLAN, NVGRE, GRE

Additional Materials:

Back to Top

Jason Faulkner, Rackspace

Jason Faulkner (T7), a network engineer for the email and applications division of Rackspace, is responsible for maintaining Linux firewalls and load balancers for millions of business email users. He is a current member of LOPSA and an active contributor to the keepalived project. Outside of his daily responsibilities, he has also maintained the computer history website oldos.org since 2003.

Read more about Jason Faulkner, Rackspace

T7

Real-World Insights on How to Secure and Route Your Linux Network

Updated!

Jason Faulkner, Rackspace

Jason Faulkner (T7), a network engineer for the email and applications division of Rackspace, is responsible for maintaining Linux firewalls and load balancers for millions of business email users. He is a current member of LOPSA and an active contributor to the keepalived project. Outside of his daily responsibilities, he has also maintained the computer history website oldos.org since 2003.

9:00 am–12:30 pm

Nautilus 2

We will be talking about how to configure IP addresses and static routes using iptables for security and convenience, the iproute2 utilities for advanced routing and IP configurations, and some "gotchas" that can happen with Linux networks. You'll learn why you should throw away ifconfig on Linux in favor of ip, and how to create and optimize iptables rule sets.

You will leave this class with enough IPv6 knowledge to kick-start your transitions or understand your preexisting IPv6 network. We'll also discuss how the loss of NAT doesn't mean your network can't be secure.

Who should attend:

Novice and intermediate Linux system and network administrators, or anyone who wants to understand the ins and outs of networking on Linux. A basic understanding of the OSI model and Linux utilities is recommended but not required. This class is especially well suited for administrators making the leap from Windows to Linux.

Take back to work:

Ways to expertly secure and route your Linux servers on an IP network.

Topics include:

Basics
- Networking review
- Basic network configuration: Sysconfig-based configuration, ifconfig/route, iproute2
- Other useful network troubleshooting tools: ethtool, sysctl -a, ip route get (ip address)
Firewalls
- High-level firewall architecture
- ip[6]tables: differences between iptables and ip6tables, common use cases, how traffic flows through iptables, targets and matches you never knew existed
Advanced topics
- Optimization
- Complex rulesets
- Custom chains
Other interesting things
- Multiple routing tables
- IPv6 tunneling
- Balancing among multiple ISPs
- Network tuning using sysactls

Additional Materials:

Back to Top

Half Day Afternoon

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

T8

Documentation Techniques for Sysadmins

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella (S3, M4, M9, T8) has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm–5:00 pm

Nautilus 3

Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

Who should attend:

System administrators who need to produce documention for the systems they manage.

Take back to work:

The ability to make immediate, practical use of these documentation techniques.

Topics include:

Why system administrators need to document
The document life cycle
Targeting your audience
An adaptable document framework
Common mistakes in documenting
Tools to assist the documentation process

Additional Materials:

Back to Top

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

Read more about Thomas A. Limoncelli, Google, Inc.

Guido Trotter, Google, Inc.

Guido Trotter (T9), a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

Read more about Guido Trotter, Google, Inc.

T9

Ganeti: Your Private Virtualization Cloud "the Way Google Does It"

New!

Thomas A. Limoncelli, Google, Inc.

Thomas A. Limoncelli (M5, M10, T9) is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Google in NYC.

Guido Trotter, Google, Inc.

Guido Trotter (T9), a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

1:30 pm–5:00 pm

Nautilus 4

This tutorial is also available for live streaming.
Find out more Purchase this Session

Ganeti is a cluster virtualization system developed mostly at Google but used by many organizations worldwide. Businesses and groups can leverage the power of Ganeti to easily and effectively manage fleets of physical hosts and use them to schedule virtual machine guests.

Reasons for choosing Ganeti include that it is very lightweight, it is simple to install and manage, and it doesn't demand special storage hardware.

Who should attend:

System engineers interested in using virtualization and cloud technologies efficiently to consolidate systems and decouple physical hardware resources from virtual systems. Ideal participants are proficient with Linux/UNIX system administration and may already be using some virtualization technologies, but want to achieve a higher level of scalability for their systems by employing a cluster management technology such as Ganeti, without the need to invest money in specialized hardware resources such as SANs.

Take back to work:

The knowledge needed to create and maintain your own Ganeti cluster, to provide an IaaS cloud or virtualized services.

Topics include:

Setting up and managing a Ganeti cluster
Ganeti internals: how to make changes
Monitoring your cluster and dealing with failure
Ganetia as a back end
Typical and atypical use cases

This tutorial is also available for live streaming.
Find out more Purchase this Session

Additional Materials:

Back to Top

Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

Read more about Jacob Farmer, Cambridge Computer Services

T10

Backups, Archiving, and Life Cycle Management: Riding the Wave of Data Proliferation

Jacob Farmer, Cambridge Computer Services

Jacob Farmer (T4, T10) is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

1:30 pm–5:00 pm

Marina 2

Most IT organizations report exponential data growth over time, and whether your data doubles every year, every two years, or every five years, the simple fact remains that if your data capacities double, then both the capacity and the performance of your backup system must double. All of this doubling stresses traditional approaches to data management. Thus, it is no surprise that backup/recovery is one of the most costly and unforgiving operations in the data center. Meanwhile, most IT organizations also report that the vast majority of their unstructured data is seldom or never accessed. Files accumulate year after year, choking the backup systems and driving up costs.

This course explores two main ways to manage the data deluge: (1) optimize backup systems by eliminating bottlenecks, streamlining operations, and bulking up backup infrastructure; and (2) manage the life cycles of unstructured data so that files that are not in active use can be managed separately from files that are in active use. We start by offering a simple framework for defining business requirements and comparing solutions at a high level. We then delve into the various mechanisms for lifecycle management and for eliminating backup system bottlenecks. Some time is spent exploring storage systems that have built-in mechanisms for data protection and lifecycle management.

Who should attend:

System administrators involved in the design and management of backup systems and policymakers responsible for protecting their organization's data.

Take back to work:

Ideas for immediate, effective, inexpensive improvements to your backup systems and a vision for how you might deploy a lifecycle management system that fits your organization.

Topics include:

Formulating strategies for data protection and lifecycle management
Identifying and addressing backup system bottlenecks
Managing fixed content
Hierarchical storage management and data migration
In-band versus out-of-band approaches to file lifecycle management
Breathing new life into tape storage
Deduplication: separating hype from reality
Object-based storage models for backup and archiving
Self-healing and self-protecting storage systems
Leveraging the cloud for backup and archiving

Additional Materials:

Back to Top

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

Read more about David N. Blank-Edelman, Northeastern University College of Computer and Information Science

T11

Implementing WordPress for SysAdmins

New!

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

1:30 pm–5:00 pm

Nautilus 5

Savvy sysadmins know that WordPress is no longer just a blogging engine. In more and more cases it hits a sweet spot for people who need a simple content management system to manage their Web site. It is powerful, relatively easy to use, doesn't require a consultant to set up, ties into a whole bunch of other services, is part of a thriving ecosystem, oh, and is (mostly) free.

If you ever get asked to create a Web site for your company, a non-profit you volunteer for, your religious community, a friend's business, an upcoming conference, or the ilk, you owe it to yourself to add WordPress to your toolbox. This class will teach you, from one sysadmin to another, how to implement WordPress to handle all of these scenarios.

Who should attend:

Anyone who has recently been asked to spin up a new Web site or manage an existing WordPress site.

Take back to work:

A sysadmin's understanding of WordPress and the confidence to implement it the right way.

Topics include:

How to choose the best way to host and develop your WordPress site
How to choose among the kerjillions of WordPress themes available
The five plug-ins you won't want to be without
Ways to extend WordPress to make it sing and dance the way you want it to
How to keep your WordPress installation as safe as possible
Where to find the best tips and tricks to further your WordPress knowledge

We'll go over all the steps a respectable sysadmin would take from first download to final deployment, including all of those testing/staging steps in between that non-sysadmins tend to forget. People with all levels of experience with WordPress are welcome. If you've already done a little work with WP, come extend and share your knowledge. After you take this class, don't be surprised if the next Web site you implement is running WordPress.

Additional Materials:

Back to Top

Theodore Ts'o, Google

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

Read more about Theodore Ts'o, Google

T12

Recovering from Linux Hard Drive Disasters

Theodore Ts'o, Google

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

1:30 pm–5:00 pm

Nautilus 1

Ever had a hard drive fail? Ever kick yourself because you didn't keep backups of critical files, or you discovered that your regularly nightly backup didn't succeed?

Of course not: everybody keeps regular backups and verifies them to make sure they are successful. But for those people who think they might nevertheless someday need this information, this tutorial will discuss ways of recovering from storage disasters caused by failures somewhere in the hardware or software stack.

Who should attend:

Linux system administrators and users.

Take back to work:

How to recover from storage disasters caused by failures somewhere in the hardware or software stack.

Topics include:

How data is stored on hard drives
Recovering from a corrupted partition table
Recovering from failed software RAID systems
Low-level techniques to recover data from a corrupted ext2/3/4 filesystem when backups aren't available
Using e2image to back up critical ext2/3/4 filesystem metadata
Using e2fsck and debugfs to sift through a corrupted filesystem
Preventive measures to avoid needing to use heroic measures

Additional Materials:

Back to Top

Wednesday, December 12, 2012

Full Day

Alan Robertson, IBM Linux Technology Center

Alan Robertson (W1) founded the High-Availability Linux (Linux-HA) project in 1998 and led the project for ten years; it has since become the Pacemaker project. He worked for SuSE for a year, then worked in IBM's Linux Technology Center for five years. Alan continues to work on Linux and UNIX systems for IBM on a variety of projects. His most recent open source project is the extremely scalable, discovery-driven Assimilation Monitoring Project.

Before joining SuSE, he was a Distinguished Member of Technical Staff at Bell Labs. He worked for Bell Labs for twenty-one years, in a variety of roles, among which were providing leading-edge computing support, writing software tools, and developing voicemail systems.

Read more about Alan Robertson, IBM Linux Technology Center

W1

Pacemaker and Linux-HA: World-Class High Availability Software

Alan Robertson, IBM Linux Technology Center

Alan Robertson (W1) founded the High-Availability Linux (Linux-HA) project in 1998 and led the project for ten years; it has since become the Pacemaker project. He worked for SuSE for a year, then worked in IBM's Linux Technology Center for five years. Alan continues to work on Linux and UNIX systems for IBM on a variety of projects. His most recent open source project is the extremely scalable, discovery-driven Assimilation Monitoring Project.

Before joining SuSE, he was a Distinguished Member of Technical Staff at Bell Labs. He worked for Bell Labs for twenty-one years, in a variety of roles, among which were providing leading-edge computing support, writing software tools, and developing voicemail systems.

9:00 am–5:00 pm

Marina 4

The Linux-HA project (http://linux-ha.org/), together with its child project, Pacemaker, is the oldest and most powerful open source high-availability (HA) package available, comparing favorably to well-known commercial HA packages. This software runs on a variety of POSIX-like systems, including FreeBSD, Solaris, and OS X.

Pacemaker+Linux-HA provides highly available services on clusters from one to more than 16 nodes with no single point of failure. These services and the servers they run on are monitored. If a service should fail to operate correctly, or a server should fail, the affected services will be quickly restarted or migrated to another server, dramatically improving service availability.

Pacemaker supports rules for expressing dependencies between services, and powerful rules for locating services in the cluster. Because these services are derived from init service scripts, they are familiar to system administrators and are easy to configure and manage.

Who should attend:

System administrators and IT architects who architect, evaluate, install, or manage critical computing systems. It is suggested that participants have basic familiarity with system V/LSB-style startup scripts, shell scripting, and XML. Familiarity with high availability concepts is not assumed.

Take back to work:

Both the basic theory of high availability systems and practical knowledge of how to plan, install, and configure highly available systems using Linux-HA and Pacemaker.

Topics include:

General HA principles
Installation of the Linux-HA and Pacemaker software
Configuration overview
Overview of commonly used resource agents
Managing services supplied with init(8) scripts
Sample configurations for Apache, NFS, DHCP, DNS, and Samba
Writing and testing resource agents conforming to the Open Cluster Framework (OCF) specification
Creating detailed resource dependencies
Creating co-location constraints
Writing resource location constraints
Causing failovers on user-defined conditions

Additional Materials:

Back to Top

Bob Hancock, Sirguey-Hancock, Ltd.

Bob Hancock (W2) is a principal in Sirguey-Hancock, Ltd., a consulting company in New York City. He has spoken throughout the US and Europe on using parallelism and concurrency to build scalable and fast applications in Python. He is the manager of the Google Developer Group—New York and a co-organizer of NYC Python. At Pycon 2012 his talk, "Optimizing Performance with Parallelism and Concurrency," was packed and can be seen at http://www.youtube.com/watch?v=ULdDuwf48kM. You can follow his writings at bobhancock.org and the Open Source project of the implementation of the xmeans algorithm for clustering unstructured data at https://github.com/bobhancock/goxmeans.

Read more about Bob Hancock, Sirguey-Hancock, Ltd.

W2

The Python Programming Language

Bob Hancock, Sirguey-Hancock, Ltd.

Bob Hancock (W2) is a principal in Sirguey-Hancock, Ltd., a consulting company in New York City. He has spoken throughout the US and Europe on using parallelism and concurrency to build scalable and fast applications in Python. He is the manager of the Google Developer Group—New York and a co-organizer of NYC Python. At Pycon 2012 his talk, "Optimizing Performance with Parallelism and Concurrency," was packed and can be seen at http://www.youtube.com/watch?v=ULdDuwf48kM. You can follow his writings at bobhancock.org and the Open Source project of the implementation of the xmeans algorithm for clustering unstructured data at https://github.com/bobhancock/goxmeans.

9:00 am–5:00 pm

Marina 6

We will take a practical tour of Python, with an emphasis on using the language to solve problems in data analysis, performance, systems administration, and network programming. The emphasis will be on solving real-world problems.

Although this will not be a lab, you should have Python 3.3 installed on your laptop if you want to follow along. Go to python.org to download and install the version appropriate for your operating system.

Who should attend:

Programmers who want to use the Python programming language and understand how it can be applied to practical problems in data analysis, system administration, systems programming, and networking. No prior Python knowledge is required, but attendees should already be experienced programmers in at least one other programming language such as C, C++, Go, Java, Perl, or Ruby.

Take back to work:

An understanding of the Python language and how it can be used to solve your problems on a daily basis.

Topics include:

The Python language
- Basic syntax
- Core datatypes
- Control flow and exception handling
- Functions, generators, and co-routines
- Modules and classes
- Testing
- C and Go extensions
- The different versions of Python
Major library modules
- Text processing
- Operating system calls
- Network programming: core and third-party
- Web programming: client and simple server
Practical programming examples
- Text parsing
- Data analysis
- Processing and analyzing large log files
- Performance considerations
- Interacting with the operating system
- Interacting with Web services
- Network programming
- Threads, futures, and co-routines

Additional Materials:

Back to Top

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick (W3) is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

Read more about Stuart Kendrick, Fred Hutchinson Cancer Research Center

W3

Root Cause Analysis

New!

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick (W3) is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

9:00 am–5:00 pm

Seabreeze

Troubleshooting is hard. I don't claim to be an expert at either doing it or teaching it. On the other hand, I have several decades of experience wielding packet analyzers, debuggers, and log parsers and have accumulated various strategies that I believe you'll find useful. This is a hands-on seminar: you will work through case studies taken from real-world situations. We divide into groups of 3–5, review a simplified version of Advance7's Rapid Problem Resolution (RPR) methodology, and then oscillate, on about a half-hour cycle, between coming together as a class and working in groups. During class time, I describe the scenario, explain the current RPR step, and offer to role-play key actors. During group time, I walk around, coaching and answering questions.

The course material includes log extracts, packet traces, strace output, network diagrams, Cacti snapshots, and vendor tech support responses, all taken from actual RCA efforts. I bring a dozen baseball caps emblazoned with Sys Admin or Storage Admin or End-User and will role-play those personas as needed.

An example: You ask the sysadmin to reboot the server. Meh, OK, the server has rebooted, but after a couple of minutes, the CPU utilization is pegged at 100% again. What do you want to do next?

BYOL (Bring Your Own Laptop) loaded with Wireshark and a graphics viewer (PDF and PNG) for some hands-on, interactive, team-oriented, real-world puzzle solving.

Draft deck visible at:
http://www.skendric.com/problem/rca/Root-Cause-Analysis-LISA-2012.pdf

Who should attend:

System admininstrators and network engineers tasked with troubleshooting multidisciplinary problems.

Take back to work:

Practice in employing a structured approach to analyzing problems that span multiple technology spaces.

Topics include:

Case studies, e.g.:

Hourly Data Transfer Fails—Every hour, an application at the clinic wakes up, contacts its partner at a central hospital, and exchanges data, thus keeping the patient databases synchronized. Several times a day, this process fails, alerting the database administrator with the helpful message "A Network Error has occurred."
Many Applications Crash—Outlook crashes, Word documents fail to save, Windows Explorer hangs: The office automation applications servicing ~1500 users intermittently report a range of error messages. Suspicion falls on the mass-storage device hosting home and shared directories.
Slow Downloads—Intermittently, both internal and external users see slow downloads from the public Web site. Is it the load-balancer, or the firewall?

Additional Materials:

Back to Top

Thursday, December 13, 2012

Full Day

Theodore Ts'o, Google, Inc.

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

Read more about Theodore Ts'o, Google, Inc.

R1

Linux Performance Tuning

Theodore Ts'o, Google, Inc.

Theodore Ts'o (T12, R1) has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am–5:00 pm

Seabreeze

The Linux operating system is commonly used in both the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs, so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprised-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workloads on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:

Strategies for performance tuning
- Characterizing your workload's requirements
- Finding bottlenecks
- Tools for measuring system performance
Memory usage tuning
Filesystem and storage tuning
NFS performance tuning
Network tuning
- Latency vs. throughput
- Capacity planning
Profiling
Memory cache and TLB tuning
Application tuning strategies

Additional Materials:

Back to Top

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

Read more about David N. Blank-Edelman, Northeastern University College of Computer and Information Science

R2

A Day Over the Edge in System Administration

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David N. Blank-Edelman (T11, R2) is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl (the second edition of the Otter book), available at purveyors of fine dead trees everywhere. He has spent the past 25+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of LISA '05 and was one of the LISA '06 Invited Talks co-chairs. David is honored to have been the recipient of the 2009 SAGE Outstanding Achievement Award and to serve on the USENIX Board of Directors.

9:00 am–5:00 pm

Marina 2

It's time to learn how to break the rules, abuse the tools, and generally turn your system administration knowledge inside out. This class is a cornucopia of ideas for creative ways to take the standard (and sometimes not-so-standard) system administration tools and techniques and use them in ways no one would expect. We'll also cover some tools you may have missed.

Note: The teacher takes no responsibility should your head explode during this class.

Who should attend:

Old-timers who think they've already seen it all and those who want to develop inventive thinking early in their career. Join us and be prepared to be delighted, disgusted, and amazed. Most of all, be ready to enrich your network and system administration by learning to be different.

Take back to work:

New approaches to old problems, along with some ways to solve the insolubles.

Topics include:

How to (ab)use perfectly good network transports by using them for purposes never dreamed of by their authors
How to improve your network services by intentionally throwing away data
How Powerpoint karaoke can make you a better sysadmin
And much, much more!

Additional Materials:

Back to Top

Friday, December 14, 2012

Full Day

Tom Wheeler, Cloudera

Tom Wheeler's (F1) career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace, and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design and implement a high-volume data processing system for WebMD, and served as senior programmer/analyst for a brokerage firm. He now works as a Curriculum Developer at Cloudera, a company that helps organizations derive value from their data through products, consulting, training, certification, and support for Apache Hadoop and related tools.

Read more about Tom Wheeler, Cloudera

F1

Apache Hadoop for Operations Staff

New!

Tom Wheeler, Cloudera

Tom Wheeler's (F1) career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace, and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design and implement a high-volume data processing system for WebMD, and served as senior programmer/analyst for a brokerage firm. He now works as a Curriculum Developer at Cloudera, a company that helps organizations derive value from their data through products, consulting, training, certification, and support for Apache Hadoop and related tools.

9:00 am–5:00 pm

Marina 2

Systems are increasingly automated, and we're generating information faster than ever before. Data storage capacity has risen while prices have fallen, finally making it possible to store and analyze vast amounts of information in order to gain valuable insight into customer behavior, important trends, and new opportunities.

At the forefront of the Big Data revolution is Apache Hadoop, an open source system that harnesses the power of commodity hardware to achieve large-scale data storage and processing. This class will begin with a solid overview of Apache Hadoop and related tools, then move into a more detailed discussion of interest to those who want to learn more about security and performance considerations, as well as the planning, installation, monitoring, and maintenance of a production Hadoop cluster.

Who should attend:

System administrators and network operations staff; architects and IT management who have a technical background; developers who are interested in a semi-technical introduction to Hadoop.

Take back to work:

What you need to get started with Hadoop.

Topics include:

Why the world needs Hadoop
What Hadoop is and what it can do
Overview of the Hadoop ecosystem
The Architecture behind Hadoop
Hadoop cluster planning
The Hadoop cluster operation

Additional Materials:

Back to Top

Eric Shamow, Puppet Labs

Read more about Eric Shamow, Puppet Labs

F2

Puppet

Eric Shamow, Puppet Labs

9:00 am–5:00 pm

Nautilus 4

We will take the student through a condensed version of the three-day Puppet Master training, describe the Puppet resource model, introduce the Puppet language, write and test Puppet modules, and deploy the Puppet Master and Dashboard. To get the most out of this class, students should bring a laptop running either VMware or Virtualbox.

Who should attend:

System administrators who are interested in deploying Puppet to subdue the chaos in their infrastructure.

Take back to work:

A thorough understanding of what you need to know and do to deploy Puppet in your organization.

Topics include:

Describing system state via Puppet Resource
Organizing resources in Puppet modules with classes and defines
Writing custom facts and functions
Automating configuration of network services
Deploying Puppet Master and Dashboard

Additional Materials:

Back to Top

Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

Read more about Mark Burgess, CFEngine, Inc.

F3

Configuration Management Solutions with CFEngine 3

Mark Burgess, CFEngine, Inc.

Mark Burgess (T5, F3) is the founder, chairman, CTO, and principal author of CFEngine. In 2011 he resigned as Professor of Network and System Administration at Oslo University College, where for twenty years he led the way in theory and practice of automation and policy-based management. In the 1990s he underlined the importance of idempotent, autonomous desired state management ("convergence") and formalized cooperative systems in the 2000s ("promise theory"). He is the author of numerous books and papers on network and system administration, including the USENIX Short Topics books A System Engineer's Guide to Host Configuration and Maintenance Using Cfengine, co-authored with Æleen Frisch, and A Sysadmin's Guide to Navigating the Business World,co-authored with Carolyn Rowland. He has won several prizes for his work.

9:00 am–5:00 pm

Marina 6

Following a complete rewrite of CFEngine with its popular new syntax and powerful pattern matching capabilities, this full-day class presents an introduction suitable for new users, as well as for users of CFEngine 2. The class is peppered with configuration examples, which can now be self-contained and modularized to an unprecedented degree in the new language.

Who should attend:

Anyone with a basic knowledge of configuration management who is interested in learning the next-generation tool.

Take back to work:

An understanding of the new features of the completely rewritten CFEngine 3, including its new syntax and benefits.

Topics include:

Moving from ad hoc scripts to automation
The importance of convergence
The Promise model
Templates and data types
Quickstart configuration
Creating configuration libraries
Upgrading from CFEngine 2
Example configurations and demos
Achieving compliance with standards and regulations
CFEngine on Windows and the Registry
Monitoring and self-healing
Brief overview of the community and commercial CFEngine roadmap

Additional Materials:

Back to Top

Tuesday, February 12, 2013

Half Day Morning

Jeff Darcy, Red Hat

Jeff Darcy (T1) has worked on network and distributed storage problems for twenty years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC and leading the HekaFS project more recently. He is currently a member of the GlusterFS architecture team at Red Hat, coordinating the integration of HekaFS's features and leading the asynchronous-replication development effort.

Read more about Jeff Darcy, Red Hat

T1

Building a Cloud Storage System

Jeff Darcy, Red Hat

Jeff Darcy (T1) has worked on network and distributed storage problems for twenty years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC and leading the HekaFS project more recently. He is currently a member of the GlusterFS architecture team at Red Hat, coordinating the integration of HekaFS's features and leading the asynchronous-replication development effort.

9:00 am–12:30 pm

Crystal Room

The trend toward moving computation into the cloud has resulted in new expectations for storage in the cloud. This tutorial will provide information necessary to build your own cloud-appropriate storage system.

Who should attend:

Primarily, people who wish to implement their own task-specific cloud storage systems. Secondarily, those who wish to understand why existing cloud storage systems have been designed the way they are, and what tradeoffs they have made to achieve their respective goals.

Topics include:

New requirements: Application-level users of cloud storage have come to expect a variety of data and consistency/ordering models well beyond those provided by traditional file, block, or relational-database systems.
New constraints: Systems deployed in the cloud are often characterized by low levels of trust (user/user and user/provider) and lack of hardware access of configuration flexibility.
Techniques: Implementing a system to meet these new requirements and constraints will require a thorough knowledge of cluster and distributed-system techniques such as vector clocks, Merkle trees, Bloom filters, and various kinds of append-only storage.
Case studies: Existing systems representing successful use of these techniques will be examined.

Additional Materials:

Back to Top

James S. Plank, University of Tennessee

Jim Plank (T2) is a professor in the EECS department at the University of Tennessee. He has done research on fault-tolerant computing and storage systems for over 20 years. For the past eight years, his sole focus has been on the design, implementation, and performance of erasure codes in storage systems. He has published numerous papers on the topic, including a very popular tutorial on Reed-Solomon codes and a complete treatment of Minimum Density codes for RAID-6. His open-source libraries for Galois Field arithmetic and for general erasure-coding have been in widespread use by industry and academia.

Read more about James S. Plank, University of Tennessee

Cheng Huang, Microsoft Research

Cheng Huang (T2) is a researcher at Microsoft Research, Redmond. He has worked extensively on erasure coding and invented technologies that have been incorporated in a wide variety of Microsoft products, such as in Lync for smooth video conferencing and in Xbox for bounding communication latency between consoles and the cloud. His latest work is LRC, a new class of erasure codes deployed in Windows Azure Storage, which saves the Microsoft Cloud millions of dollars (see http://research.microsoft.com/en-us/news/features/erasurecoding-090512.aspx) and also received the best paper award at USENIX ATC '12.

Read more about Cheng Huang, Microsoft Research

T2

Erasure Coding for Storage Applications

James S. Plank, University of Tennessee

Jim Plank (T2) is a professor in the EECS department at the University of Tennessee. He has done research on fault-tolerant computing and storage systems for over 20 years. For the past eight years, his sole focus has been on the design, implementation, and performance of erasure codes in storage systems. He has published numerous papers on the topic, including a very popular tutorial on Reed-Solomon codes and a complete treatment of Minimum Density codes for RAID-6. His open-source libraries for Galois Field arithmetic and for general erasure-coding have been in widespread use by industry and academia.

Cheng Huang, Microsoft Research

Cheng Huang (T2) is a researcher at Microsoft Research, Redmond. He has worked extensively on erasure coding and invented technologies that have been incorporated in a wide variety of Microsoft products, such as in Lync for smooth video conferencing and in Xbox for bounding communication latency between consoles and the cloud. His latest work is LRC, a new class of erasure codes deployed in Windows Azure Storage, which saves the Microsoft Cloud millions of dollars (see http://research.microsoft.com/en-us/news/features/erasurecoding-090512.aspx) and also received the best paper award at USENIX ATC '12.

9:00 am–12:30 pm

Gold Room

From disk arrays through clouds to archival systems, storage systems must tolerate failures and prevent data loss. Erasure coding provides the fundamental technology for storage systems to add redundancy and tolerate failures. This tutorial will cover the fundamentals of erasure coding, the mechanics of many erasure codes that apply to today's storage systems, and the properties of various erasure codes designed for a variety of storage scenarios.

Topics include:

General matrix-based codes, starting with classic Reed-Solomon codes
Galois Field arithmetic for erasure-coding, and how to implement it efficiently
RAID-6 codes: RDP, EVENODD, Minimum Density, X-Code
More general codes implemented with only the XOR operation: Generalized RDP/EVENODD
Cauchy Reed-Solomon codes
Open source library support for erasure codes
The reconstruction problem and techniques to reduce bandwidth and I/O
Regenerating codes
Practical MDS codes with efficient reconstruction: Rotated Reed-Solomon
Practical non-MDS codes with efficient reconstruction and their application in cloud storage: Pyramid codes, LRC and its deployment in Windows Azure Storage, PMDS
Erasure coding for Flash

Additional Materials:

Back to Top

Half Day Afternoon

Sudipta Sengupta, Microsoft Research

Dr. Sudipta Sengupta (T3) is currently at Microsoft Research, where he is working on data center systems and networking for cloud computing, non-volatile memory for cloud/server applications, data deduplication, and storage virtualization. Previously, he spent five years at Bell Laboratories, the Research Division of Lucent Technologies. His work on primary data deduplication will ship as a new feature in Windows Server 2012. His work on flash-memory based key-value stores has been incorporated in the data processing and serving pipeline for multiple properties in Microsoft's Bing system. He designed the network topology and routing algorithms for VL2, a low-cost, flexible, and agile next generation data center network, which has been deployed in Microsoft's cloud data centers.

Dr. Sengupta received the IEEE William R. Bennett Prize and the IEEE Leonard G. Abraham Prize for his work on oblivious routing of Internet traffic. At Bell Labs, he received the President's Teamwork Achievement Award for technology transfer of research into Lucent products. At Microsoft, he received the Gold Star Award which recognizes excellence in leadership and contributions for Microsoft's long term success.

Dr. Sengupta has taught advanced courses/tutorials at many academic/research and industry conferences. He has published 75+ research papers in some of the top conferences, journals, and technical magazines. He has authored 45+ patents (granted or pending) in the area of computer systems, storage, and networking. Dr. Sengupta received a Ph.D. and an M.S. from MIT (Cambridge, MA, USA) and a B.Tech. from IIT-Kanpur (India), all in Computer Science. He was awarded the President of India Gold Medal at IIT-Kanpur for graduating at the top of his class across all disciplines.

Read more about Sudipta Sengupta, Microsoft Research

T3

Data DeDuplication: Technologies, Trends, and Challenges

Sudipta Sengupta, Microsoft Research

Dr. Sudipta Sengupta (T3) is currently at Microsoft Research, where he is working on data center systems and networking for cloud computing, non-volatile memory for cloud/server applications, data deduplication, and storage virtualization. Previously, he spent five years at Bell Laboratories, the Research Division of Lucent Technologies. His work on primary data deduplication will ship as a new feature in Windows Server 2012. His work on flash-memory based key-value stores has been incorporated in the data processing and serving pipeline for multiple properties in Microsoft's Bing system. He designed the network topology and routing algorithms for VL2, a low-cost, flexible, and agile next generation data center network, which has been deployed in Microsoft's cloud data centers.

Dr. Sengupta received the IEEE William R. Bennett Prize and the IEEE Leonard G. Abraham Prize for his work on oblivious routing of Internet traffic. At Bell Labs, he received the President's Teamwork Achievement Award for technology transfer of research into Lucent products. At Microsoft, he received the Gold Star Award which recognizes excellence in leadership and contributions for Microsoft's long term success.

Dr. Sengupta has taught advanced courses/tutorials at many academic/research and industry conferences. He has published 75+ research papers in some of the top conferences, journals, and technical magazines. He has authored 45+ patents (granted or pending) in the area of computer systems, storage, and networking. Dr. Sengupta received a Ph.D. and an M.S. from MIT (Cambridge, MA, USA) and a B.Tech. from IIT-Kanpur (India), all in Computer Science. He was awarded the President of India Gold Medal at IIT-Kanpur for graduating at the top of his class across all disciplines.

1:30 pm–5:00 pm

Crystal Room

The tutorial will serve to introduce the state of the art in data deduplication systems for storage. We will make the presentation of most of the material self-contained. We expect attendees to have some background in the basic concepts of storage systems.

The storage market is witnessing unprecedented growth, with enterprise storage growing 50–60% per year and cloud storage growing even faster. Data deduplication is the #1 feature for which customers ask when they invest in storage solutions. Data deduplication detects and eliminates redundancies in data, with the benefits applying to both storage capacity savings ("data at rest") and network bandwidth savings ("data on wire"). In addition to taming the growth in storage total-cost-of-ownership, the storage capacity savings can help to make high IOPS devices like flash-based SSDs more feasible in terms of cost. The network bandwidth savings can help to mitigate WAN bottlenecks, thus enabling user-to-cloud and hybrid private-public cloud storage scenarios.

Backup data deduplication has been around for about a decade, championed by early startups in the space such as Data Domain. Recent developments bring data deduplication to the more expensive and faster primary storage tier, where deduplication space savings is more valuable, translating to reductions in the amount of data that needs to be replicated, geo-replicated, cached, backed up, and transferred over the network.

In this tutorial, we will survey technologies in the data deduplication area at both the algorithmic and systems levels. We will follow the progression of ideas over time and identify current trends in research and industry. We will outline the challenges that need to be addressed going forward. Topics covered will include research aspects of the entire data deduplication pipeline—data chunking, data indexing, primary data access, storage maintenance operations—as well as case studies of commercially deployed systems.

Who should attend:

Graduate students and researchers working in the areas of storage, enterprise computing, cloud computing, and enterprise/Web services; practicing storage professionals in the technology industry, especially in enterprise and cloud data center space.

Additional Materials:

Back to Top

Sandeep Uttamchandani, VMware

Dr. Sandeep Uttamchandani (T4) is the Technical Director for Storage at VMware. Sandeep has worked on a wide variety of enterprise storage products and technologies, as well as has been closely involved in operational management of petabyte scale deployments for large business critical deployments. Sandeep holds 22 issued patents, and has 28 peer-reviewed publications in key storage conferences including FAST, USENIX ATC, and SIGMOD. Prior to VMware, Sandeep was the Chief Architect for Advanced Storage Technologies at IBM GTS, and was responsible for shaping the technical storage strategy for a $10B services business. Previously, Sandeep was Master Inventor at IBM Storage Research Center at Almaden. He holds a Masters and PhD in Computer Science from University of Illinois at Urbana Champaign (UIUC).

Read more about Sandeep Uttamchandani, VMware

T4

Design Trade-offs of CAP Theorem and Beyond: Understanding Implications of Design Choices in a Software-defined Shared Nothing Storage Architecture

Sandeep Uttamchandani, VMware

Dr. Sandeep Uttamchandani (T4) is the Technical Director for Storage at VMware. Sandeep has worked on a wide variety of enterprise storage products and technologies, as well as has been closely involved in operational management of petabyte scale deployments for large business critical deployments. Sandeep holds 22 issued patents, and has 28 peer-reviewed publications in key storage conferences including FAST, USENIX ATC, and SIGMOD. Prior to VMware, Sandeep was the Chief Architect for Advanced Storage Technologies at IBM GTS, and was responsible for shaping the technical storage strategy for a $10B services business. Previously, Sandeep was Master Inventor at IBM Storage Research Center at Almaden. He holds a Masters and PhD in Computer Science from University of Illinois at Urbana Champaign (UIUC).

1:30 pm–5:00 pm

Gold Room

The key objective of this tutorial is to provide an understanding of how the design choices made for the key building blocks (such as metadata service, replication, locking, etc.) impacts the overall properties of the shared nothing storage architecture, and also mapping it to the application data model and storage workload requirements.

Eric Brewer coined the CAP theorem to convey that the design of a scale-out system involves trade-offs. CAP is commonly oversimplified to mean that between Consistency, Availability, and Partition tolerance, only two of the three attributes can be realized in a system. In general, the architecture of any shared nothing scale-out storage involves a collection of design choices and trade-offs that ultimately dictate the observable behavior of the system. Following are some choices involved in the design of a shared nothing storage solution:

Data locality versus cluster scalability?
Master versus masterless metadata architectures?
Locking versus multi-version concurrency control?
Strong versus eventual versus weak consistency?
Replication versus RAID?
Node-to-node communication: UDP versus TCP versus RDMA?
Two-phase commit versus Paxos versus Multi-Paxos?
In-memory data grids versus disk-based DAS architectures?
Data models: ACID versus BASE (Basically Available, Soft state, Eventually consistent)?

We will start the tutorial with a bare-bones skeleton of the architecture, then incrementally populate the building blocks. For each building block, we discuss popular design choices, followed by an interactive discussion on the implications of mix-and-match of these building blocks (for example, matching coarse-grained data sharding for better data locality performance, with appropriate patterns for scaling and distributed data recovery). The tutorial assumes a basic knowledge of distributed systems. Additionally, to better appreciate the under-the-hood exploration, we expect an awareness of the cloud storage landscape, and a high-level understanding of the popular solutions.

Who should attend:

Storage architects, engineers, administrators, and students, who are interested in a deep-dive of building blocks and design patterns of software-defined shared nothing storage architectures (a.k.a. cloud storage).

Additional Materials:

Back to Top

Tuesday, April 2, 2013

Full Day

Vicraj Thomas, BBN Technologies and GENI Project Office (GPO)

Dr. Vicraj (Vic) Thomas is responsible for Distributed Systems and Security for the GENI Project Office (GPO). His expertise lies in the areas of highly-dependable systems, distributed systems protocols and architectures, and wireless sensor networks. Vicraj is the System Engineer supporting the GENI Experimenter Workflow and Services working group. Before joining BBN Technologies, Vic was with the Honeywell Laboratories where he developed technologies related to wireless networks for flight-essential avionics applications. Dr. Thomas was the Industrial Technology Area Leader for the sensor networks research area within the US-UK International Technology Alliance (ITA) program on Network Science. He also was a systems architect on the NASA C3I network for the Orion program. Vic earned a B. Tech. in Computer Science and Engineering from the Indian Institute of Technology, Mumbai. He earned a M.S. in Computer Science from the University of Rhode Island and a Ph.D. in Computer Science from the University of Arizona.

Read more about Vicraj Thomas, BBN Technologies and GENI Project Office (GPO)

Niky Riga, BBN Technologies and GENI Project Office (GPO)

Niky is responsible for supporting GENI users in integrating and deploying their experiments within the GENI infrastructure, and ensuring that the deployment makes the best use of GENI resources.

Before joining the GPO in 2010, Niky worked on innovative projects within the Network Research department of BBN. Her focus was designing and prototyping pioneering transport services for Mobile Ad-hoc Networks.

Niky earned a Diploma in Electrical and Computer Engineering at the National Technical University of Athens, and an MS degree in Computer Science at Boston University.

Read more about Niky Riga, BBN Technologies and GENI Project Office (GPO)

Using the GENI Testbed for Network and Distributed System Experiments

New!

Vicraj Thomas, BBN Technologies and GENI Project Office (GPO)

Dr. Vicraj (Vic) Thomas is responsible for Distributed Systems and Security for the GENI Project Office (GPO). His expertise lies in the areas of highly-dependable systems, distributed systems protocols and architectures, and wireless sensor networks. Vicraj is the System Engineer supporting the GENI Experimenter Workflow and Services working group. Before joining BBN Technologies, Vic was with the Honeywell Laboratories where he developed technologies related to wireless networks for flight-essential avionics applications. Dr. Thomas was the Industrial Technology Area Leader for the sensor networks research area within the US-UK International Technology Alliance (ITA) program on Network Science. He also was a systems architect on the NASA C3I network for the Orion program. Vic earned a B. Tech. in Computer Science and Engineering from the Indian Institute of Technology, Mumbai. He earned a M.S. in Computer Science from the University of Rhode Island and a Ph.D. in Computer Science from the University of Arizona.

Niky Riga, BBN Technologies and GENI Project Office (GPO)

Niky is responsible for supporting GENI users in integrating and deploying their experiments within the GENI infrastructure, and ensuring that the deployment makes the best use of GENI resources.

Before joining the GPO in 2010, Niky worked on innovative projects within the Network Research department of BBN. Her focus was designing and prototyping pioneering transport services for Mobile Ad-hoc Networks.

Niky earned a Diploma in Electrical and Computer Engineering at the National Technical University of Athens, and an MS degree in Computer Science at Boston University.

9:00 am–5:00 pm

The Global Environment for Network Innovations, GENI, is a suite of research infrastructure rapidly taking shape in prototype form across the United States. It is sponsored by the National Science Foundation, with the goal of providing a laboratory environment for networking and distributed systems research and education. It is well suited for exploring networks at scale thereby promoting innovations in network science, security, services and applications.

This tutorial will provide a description of GENI as an experimenter tool and also offer the chance to attendees to gain some hands-on experience in using the testbed. We are going to use the multi-campus Layer 2 network to design and run simple Layer 2 and Layer 3 experiments.

The tutorial will have interleaving presentations and hands-on exercises. Attendees will get a chance to create and run GENI experiments following simple instructions. The attendees will get to work at their own pace and get help by the instructors.

Who should attend:

Graduate students and researchers working in areas of computer networking, future Internet architectures, distributed systems, and cloud computing
Professors of computer networks and distributed systems that would like to use a large-scale testbed in their class
Anyone who is interested in Future Internet Architecture or distributed cloud computing
Anyone who wants to get acquainted with GENI, a platform for running network experiments, and gain hands-on experience in using it

Additional Materials:

Back to Top

Sunday, November 3, 2013

Full Day

Rik Farrow, Security Consultant

Rik Farrow began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

Read more about Rik Farrow, Security Consultant

S1

Securing Linux Servers

Rik Farrow, Security Consultant

Rik Farrow began working with UNIX system security in 1984 and with TCP/IP networks in 1988. He taught his first security class in 1987 and started teaching internationally the following year. He has been a consultant since 1980 and has advised both firewall and intrusion detection companies in the design of their products. Rik has published two books, one on UNIX security and the other on system administration. He designed a five-day, hands-on class in Internet security for internal use by the Department of Defense. He wrote the "Network Defense" column for Network Magazine for over six years and is currently the editor of ;login:, the USENIX magazine. Rik lives with his wife in the high desert of northern Arizona, where he continues to work and do research, and he still ventures out to teach.

9:00 am–5:00 pm

Lincoln 2 Room

Linux servers start out very secure; it's what you do with them when you use them that can create insecurities. A straight Linux server install runs minimal services and has few users and a very restrictive firewall, which is a great security posture but is pretty useless for most purposes. As users are added, services enabled, and holes punched through the firewall, the security can deteriorate quickly.

This class will show you how to maintain a strong security posture through careful configuration and proper use of Linux tools and services. Linux contains tools and software that can be enabled to slow brute-force attacks against user accounts, can notice when your accounts have weak passwords or are under attack, can keep services and software up to date, and can sandbox applications to prevent even zero-day attacks. The class will focus on attacks most recently seen, including attacks on mail and Web servers.

Who should attend:

Linux system administrators and security managers familiar with Linux system administration, whether you manage a handful or clusters of Linux systems.

Take back to work:

Techniques for securing and maintaining Linux servers.

Topics include:

Minimizing risk with appropriate restrictions
Managing and tracking application vulnerabilities
Sandboxing to prevent attacks
Monitoring logfiles
Updates and configuration management

Additional Materials:

Back to Top

John Looney, Google

John Looney graduated from Computer Applications at Dublin City University, and specialized in supporting high-end commercial UNIX environments. He noticed that Irish sysadmin administrators had no formal training opportunities, and set up the Irish Linux User Group to provide a community and monthly free training sessions on the latest open source technologies to over 400 Irish engineers. He ran LinuxWorld Dublin in 2000. While at Hosting365 he built Ireland's largest shared hosting infrastructure for 30,000 customers on a shoestring budget.

Today, he's a Site Reliability Engineer responsible for Google's cluster infrastructure; initially the cluster fabric, GFS and Chubby, and more recently the datacenter automation and remote access technologies. He has built a five-month full-time graduate program to take junior engineers and retrain them to take the pager for Google.com. John is on the Computing Committee for Engineers Ireland.

Read more about John Looney, Google

S2

SRE Classroom: Non-Abstract Large System Design for Sysadmins

New!

John Looney, Google

John Looney graduated from Computer Applications at Dublin City University, and specialized in supporting high-end commercial UNIX environments. He noticed that Irish sysadmin administrators had no formal training opportunities, and set up the Irish Linux User Group to provide a community and monthly free training sessions on the latest open source technologies to over 400 Irish engineers. He ran LinuxWorld Dublin in 2000. While at Hosting365 he built Ireland's largest shared hosting infrastructure for 30,000 customers on a shoestring budget.

Today, he's a Site Reliability Engineer responsible for Google's cluster infrastructure; initially the cluster fabric, GFS and Chubby, and more recently the datacenter automation and remote access technologies. He has built a five-month full-time graduate program to take junior engineers and retrain them to take the pager for Google.com. John is on the Computing Committee for Engineers Ireland.

9:00 am–5:00 pm

Thurgood Marshall North Ballroom

Truly large-scale systems are still rare; in a world of outsourcing and cloud computing, it's harder for system administrators to get the opportunity to design large systems. It's even harder to get the design right. Most organizations don't have the in-house expertise to build a large system and thus outsource the detailed design to external contractors. If your organization doesn't have the expertise to design a large system, it's unlikely that it has the expertise to confirm a proposal is fit for purpose and cost-effective.

While anyone can wave their hands convincingly and come up with a rough outline of a large distributed system, those who also can fill in the detail are highly prized. This class will teach you how to design software systems like Imgur and Twitter, then estimate the hardware needed to ensure you can deliver to an SLA. You will learn how requirements like queries-per-second, multi-site reliability, and data-security impact the cost of implementation.

This will not cover concurrent programming and the software-engineering side of distributed systems.

Who should attend:

System administrators, SREs, and DevOps who have some familiarity with distributed systems, server hardware, and systems programming, especially those who would like to work with, procure, or build large distributed systems.

Take back to work:

The ability to design large distributed systems, to evaluate design proposals, and to explain such designs to third parties.

Topics include:

Design patterns for large distributed systems
Monitoring large-scale systems
Large-scale design workshop and presentations
Non-abstract design; taking a design and producing a "bill of materials"
Designing for failure; how to work around rack, networking, and datacenter failures

Additional Materials:

Back to Top

Half Day Morning

Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

Read more about Steven Murawski, Stack Exchange

S3

PowerShell Fundamentals

Updated!

Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

9:00 am–12:30 pm

Thurgood Marshall South Ballroom

Do you find yourself repeating a task often? Do you have to retrieve information or change settings on a number of servers or users at once or regularly? Do you find clicking repetitively just too slow to keep up?

If you answered any of these questions with a "Yes," don't miss this half-day class. We will cover a number of ways to make you more productive in less time—and it is far easier than you may think possible.

Who should attend:

System administrators and anyone else who wants to be more productive on the Microsoft Windows platform.

Take back to work:

Usable commands and patterns to make attendees more effective in working with the Windows platform, along with familiarity with the discovery patterns in PowerShell, so that they can continue to develop their skills.

Topics include:

Introduction to PowerShell
Finding the commands you need
What's in the help files
Discovering hidden gems in command output
Working from the shell
Navigating the file system, registry, and more
Working with objects (everything in PowerShell is an object)
Working with servers and workstations
Discovering WMI/CIM
Working with text
Dealing with remote machines via WMI/CIM and PowerShell Remoting
PowerShell Versions 2, 3 and the upcoming release of Version 4

Additional Materials:

Back to Top

Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

Read more about Shumon Huque, University of Pennsylvania

S5

DNSSEC: DNS Security Extensions

New!

Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

9:00 am–12:30 pm

Wilson C Room

This class will provide system administrators with a detailed understanding of the DNS Security Extensions (DNSSEC). It will provide practical information about configuring DNSSEC using the popular ISC BIND DNS software and will cover both using DNSSEC to cryptographically sign your own DNS zones and configuring DNS resolvers to validate DNSSEC signatures. Many examples of DNS/DNSSEC querying and debugging using the "dig" tool and other diagnostic tools and programs will also be covered. The last part of the course will cover prospects for newer and more exciting uses of the DNSSEC by application protocols that are in the pipeline, such as DANE and TLSA records.

Who should attend:

System administrators and engineers who are tasked with providing DNS services, as well as anyone interested in knowing more about how DNS and DNSSEC works.

Take back to work:

A detailed understanding of DNSSEC with the basic knowledge to configure and deploy it.

Topics include:

DNS protocol basics
DNSSEC extensions and how they work
Configuring, deploying, and troubleshooting DNSSEC
Application uses of DNSSEC

Additional Materials:

Back to Top

John Sellens, SYONEX

John Sellens has been involved in system and network administration since 1986 and is the author of several related USENIX papers, a number of ;login: articles, and the USENIX Short Topics book #7, System and Network Administration for Higher Reliability. He holds an M.Math. in computer science from the University of Waterloo and is a Chartered Accountant. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the systems team at Magna International. From 1999 to 2004, he was the General Manager for Certainty Solutions in Toronto. Prior to joining Certainty, John was the Director of Network Engineering at UUNET Canada and was a staff member in computing and information technology at the University of Waterloo for 11 years.

Read more about John Sellens, SYONEX

S6

Nagios: Advanced Topics

John Sellens, SYONEX

John Sellens has been involved in system and network administration since 1986 and is the author of several related USENIX papers, a number of ;login: articles, and the USENIX Short Topics book #7, System and Network Administration for Higher Reliability. He holds an M.Math. in computer science from the University of Waterloo and is a Chartered Accountant. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the systems team at Magna International. From 1999 to 2004, he was the General Manager for Certainty Solutions in Toronto. Prior to joining Certainty, John was the Director of Network Engineering at UUNET Canada and was a staff member in computing and information technology at the University of Waterloo for 11 years.

9:00 am–12:30 pm

Thurgood Marshall West Ballroom

Nagios is a very widely used tool for monitoring hosts and services on a network. It's very flexible and configurable, and can be extended in many ways using home-grown or already existing extensions. This tutorial will cover the advanced features and abilities of Nagios and related tools, which are especially useful in larger or more complex environments, or for higher degrees of automation or integration with other systems.

Who should attend:

Network and system administrators ready to implement or extend their use of the Nagios system and network monitoring tool.

Take back to work:

The information you need to immediately implement and use the advanced features of Nagios and related tools for monitoring systems and devices on your networks.

Topics include:

Theory of operation
Configuration for more complex environments
Plug-ins: Their creation, use, and abuse
Extensions: NRPE, NSCA, NDOUtils
Add-ons: Graphing, integration with other tools
Abuse: Unexpected uses and abuses of Nagios

Additional Materials:

Back to Top

Half Day Afternoon

Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

Read more about Steven Murawski, Stack Exchange

S7

Building Your PowerShell Toolkit

New!

Steven Murawski, Stack Exchange

Steven is a sysadmin on the Site Reliability Engineering team for Stack Exchange (operators of great Q&A sites such as Server Fault and Stack Overflow) and a Microsoft MVP in PowerShell. Steven also leads two local user groups, the Greater Milwaukee IT Pro User Community and the Greater Milwaukee Script Club. He speaks regularly to local user groups and can be found at various conferences.

1:30 pm–5:00 pm

Thurgood Marshall South Ballroom

Have you found yourself writing little PowerShell scripts to help you day to day? Do you have to share these with co-workers? Do you remember what that script you wrote six months ago does or the parameters that it needs? Join me in learning how to take your PowerShell scripting to the next level.

Who should attend:

System administrators or anyone with basic PowerShell experience who needs to take it to the next level and create reusable, production-ready commands.

Take back to work:

A pattern for taking your basic scripts and functions and turning them into professional, pipeline-ready, production-oriented commands, and the ability to assemble modules, which are the method for organizing and distributing PowerShell commands.

Topics include:

Using basic commands
Developing scripts and functions
Packaging scripts and functions as a solution that is reusable, redistributable, and usable with the maximum variety of input data
Error handling
Creating help files
Working with the pipeline (for receiving input and processing data)
A variety of tips and tricks to make your functions robust and practical
The ins and outs of packaging your commands as modules, creating module manifests, and
things to consider with module development

Additional Materials:

Back to Top

Richard Elling, Nexenta Systems

Richard Elling has been designing and building dependable, networked computer systems to solve complex problems for more than 25 years. He was an early adopter of ZFS and has developed benchmarks and analysis techniques for evaluation of data protection schemes and performability of systems. He wrote Designing Enterprise Solutions with Sun Cluster 3.0 (Prentice Hall, 2002) and has authored many white papers, Sun BluePrints, and refereed papers on dependable systems and performability. He is a regular contributor to the ZFS community and is currently the Director of Solution Engineering for Nexenta Systems.

Read more about Richard Elling, Nexenta Systems

S8

File Systems: Top to Bottom and Back

New!

Richard Elling, Nexenta Systems

Richard Elling has been designing and building dependable, networked computer systems to solve complex problems for more than 25 years. He was an early adopter of ZFS and has developed benchmarks and analysis techniques for evaluation of data protection schemes and performability of systems. He wrote Designing Enterprise Solutions with Sun Cluster 3.0 (Prentice Hall, 2002) and has authored many white papers, Sun BluePrints, and refereed papers on dependable systems and performability. He is a regular contributor to the ZFS community and is currently the Director of Solution Engineering for Nexenta Systems.

1:30 pm–5:00 pm

Thurgood Marshall East Ballroom

Linux systems offer a staggering number of options for file systems. With a bit of work, you can get pretty much any legacy file system to be recognized. However, for current production needs, only a very few of those many, many choices actually matter: Ext4, Btrfs, and ZFS. This course will discuss these three file systems in gory detail and present the results of a variety of performance tests, run/supervised by the presenter, which were designed to test the file systems in real computing operations (as opposed to benchmark scenarios).

Who should attend:

System administrators who wish to understand the structure and performance of modern file systems from an in-use point of view. This is not a course for kernel hackers, though; people wanting an in-depth look at the VFS code will not have their needs met.

Take back to work:

An understanding of the benefits of and differences between the three file systems under a variety of realistic scenarios.

Topics include:

Virtual file system (VFS) overview
Ext4 structure and goals
Btrfs structure and goals
ZFS structure and goals
Performance results

Additional Materials:

Back to Top

Michele Chubirka, Packetpushers

Michele Chubirka, aka "Mrs. Y.," is a recovering UNIX engineer working in network security. She is also the host of the Healthy Paranoia podcast, the security feed of Packetpushers, and official nerd hunter. She likes long walks in hubsites, traveling to security conferences, and spending time in the Bat Cave. She sincerely believes that every problem can be solved with a "for" loop. When not blogging or podcasting, she can be found using up her 15 minutes in the Twittersphere or Google+ as @MrsYisWhy.

Read more about Michele Chubirka, Packetpushers

Joe Weston, Consultant and founder of Respectful Confrontation method

Joe Weston is a workshop facilitator, consultant, and author of the book Mastering Respectful Confrontation. He is also the founder of the Heartwalker Peace Project, which creates opportunities for connection, discussion, and creative collaboration.

Read more about Joe Weston, Consultant and founder of Respectful Confrontation method

S9

A New Model: Advancing Organizational Security Through Peacebuilding

New!

Michele Chubirka, Packetpushers

Michele Chubirka, aka "Mrs. Y.," is a recovering UNIX engineer working in network security. She is also the host of the Healthy Paranoia podcast, the security feed of Packetpushers, and official nerd hunter. She likes long walks in hubsites, traveling to security conferences, and spending time in the Bat Cave. She sincerely believes that every problem can be solved with a "for" loop. When not blogging or podcasting, she can be found using up her 15 minutes in the Twittersphere or Google+ as @MrsYisWhy.

Joe Weston, Consultant and founder of Respectful Confrontation method

Joe Weston is a workshop facilitator, consultant, and author of the book Mastering Respectful Confrontation. He is also the founder of the Heartwalker Peace Project, which creates opportunities for connection, discussion, and creative collaboration.

1:30 pm–5:00 pm

Wilson C Room

Why is the security industry so full of fail? We spend millions of dollars on firewalls, IPS, IDS, DLP, professional penetration tests and assessments, and vulnerability and compliance tools, and at the end of the day, the weakest link is the user and his or her inability to make the right choices. It's enough to make a security engineer cry.

The one thing you can depend upon in an enterprise is that many of your users, even with training, will still make the wrong choices. They will violate BYOD restrictions, click on links they shouldn't, respond to phishing scams, open documents without thinking, post too much information on Twitter and Facebook, use their pet's name as passwords, etc. But what if this isn't because users hate us or are too stupid? What if all our ignored policies and procedures regarding the best security practices have more to do with our failure to understand modern neuroscience and the human mind's resistance to change?

Humans are wired to be emotional beings. Emotions influence most of our decisions, good and bad. In failing to understand how this is at the root of user non-compliance, no matter how much money we spend on expensive hardware and software, we will fail to achieve the goal of good organizational security.

Attendees will have the opportunity to practice various communication methods and conflict resolution skills that will improve their ability to motivate change in their organizations.

Who should attend:

Sysadmins and security professionals who need tools to improve user security compliance in their organizations.

Take back to work:

Critical mind hacks for improving your professional lives as well as your relationships with others, achieved through "hacking the human," not by software or hardware.

Topics include:

The affective neuroscience behind risky behavior
Collaborative communication methods and conflict resolution skills to crowdsource the goal of organizational security

Additional Materials:

Back to Top

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

Read more about David N. Blank-Edelman, Northeastern University College of Computer and Information Science

S10

Build a Sysadmin Sandbox

New!

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

1:30 pm–5:00 pm

Thurgood Marshall West Ballroom

Every sysadmin I know has at least four new technologies they would like to play with—and that's just before breakfast. But finding the time to do this can be difficult, often because of the effort it takes to find a machine, set up the network connection, firewall it from the other machines, and so on... If you want to try something that requires multiple components, that's even worse.

What you need is a sandbox: a place where you can experiment with new technologies, mess with cool tools, test the latest versions of software, prototype new services, and so on, without any sort of heavyweight effort.

Who should attend:

Sysadmins who want to test technologies safely and efficiently.

Take back to work:

The ability to build your own sysadmin sandboxes.

Topics include:

Using software and services like Git, Vagrant, and Amazon EC2 to make sandbox construction easy and free/inexpensive
When a sandbox approach is and is not appropriate
Migrating your work in and out of sandboxes
How to share the work you've done in a sandbox with others

Additional Materials:

Back to Top

Monday, November 4, 2013

Full Day

Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

Read more about Shumon Huque, University of Pennsylvania

M1

Using and Migrating to IPv6

Updated!

Shumon Huque, University of Pennsylvania

Shumon Huque http://www.huque.com/~shuque/ is the Director of Engineering, Research, and Development for the University of Pennsylvania's Networking and Telecommunications division and also serves as the Lead Engineer for the MAGPI GigaPoP. He is involved in network engineering, systems engineering, and the design and operation of key infrastructure services at Penn (DNS, DHCP, authentication, email, Web, VoIP, Directory, etc). He holds Bachelor's and Master's degrees in computer science from Penn. In addition to his day job, Shumon teaches (part -time) a lab course on advanced network protocols at Penn's Engineering School.

Shumon is the principal IPv6 architect at Penn and has been running production IPv6 networks and services for almost a decade.

9:00 am–5:00 pm

Wilson C Room

The Internet is facing an imminent exhaustion of IP addresses. IPv6, the next-generation Internet Protocol, is designed to address this problem, among other things. If you have not yet started to deploy IPv6, now is the time. The "World IPv6 Launch" event took place in June 2012, and major industry players such as Google, Facebook, Comcast, Yahoo!, Akamai, and Cisco already support IPv6, with many others coming on board in the near future. You need to be next!

This tutorial will provide a detailed introduction to IPv6 and will also cover practical aspects of configuring and using IPv6 in networks, operating systems, and applications. Basic knowledge of IPv4 is assumed, but no prior knowledge of IPv6 is necessary. The course will go over the basics of IPv6 and dive into the details you need.

Who should attend:

System administrators, network administrators, and application developers who need to prepare for migration to IPv6, and anyone who wants a general introduction to IPv6 and what is involved in deploying it.

Take back to work:

An understanding of IPv6, with the basic knowledge to begin designing and deploying IPv6 networks, systems, and applications.

Topics include:

The current IP address landscape and the motivation for IPv6
IPv6 addresses and protocol details
DHCPv6
DNS
Tunneling
Configuring hosts and application services
IPv6-related security topics

Additional Materials:

Back to Top

Branson Matheson, SGT

Branson is a 25-year veteran of system administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines, while continuing to support many open source projects. He founded sandSecurity to provide policy and technical audits, plus support and training for IT security, system administrators, and developers. Branson currently is a systems architect for NASA; has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

Read more about Branson Matheson, SGT

M2

Hands-on Security for System Administrators

New!

Branson Matheson, SGT

Branson is a 25-year veteran of system administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines, while continuing to support many open source projects. He founded sandSecurity to provide policy and technical audits, plus support and training for IT security, system administrators, and developers. Branson currently is a systems architect for NASA; has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

9:00 am–5:00 pm

Thurgood Marshall West Ballroom

System administrators often run into interesting conflicts between sysadmin, customer, and IT security needs. These conflicts generally lead to difficulty in achieving a balance between administrative convenience, good customer service, and minimal risk. There are processes or services available that can significantly improve any of these areas; however, many times they are costly or resource intensive. This course is designed for system administrators who want to improve the security posture of their organizations, using IT security standards paired with good system administration practices, resulting in better service, lower risk, and minimal impact to time and budget.

We will walk a path, covering many domains of IT security and enterprise system administration in ways that utilize the interconnection between good system administration practices and security fundamentals. We will discuss recent risks and threats, analyze them with respect to your environment, review possible impacts and develop mitigations that you can apply immediately. Training includes instruction, discussion, many hands-on labs, and a strong dose of common sense.

Attendees should bring a laptop capable of running a Virtual Guest and will be provided a VM in which to work. The class will have access to a test network of systems for training and lab exercises. You will return to your organization with a toolbox of documentation, (mostly) free software, and a good starting point for developing better practices to improve security and system administration.

Who should attend:

Beginning to mid-level system administrators of any stripe with an interest in IT security and a desire to improve their security. It is suggested that participants have experience with the *nix command line.

Take back to work:

Documentation, tips, tricks, and tools tailored to your environment that can be implemented to improve security posture, processes, and operations in your organization.

Topics include:

The relationship between system administration and IT security
Security theories, standards and risk mitigation as applied by SA's
Information management using Trac and Subversion
Good system administration practices that directly improve IT security
Basic configuration driven system management using Puppet
Host and network auditing, hardening, and monitoring
Developing an effective security awareness program

Additional Materials:

Back to Top

Half Day Morning

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

M3

Core Skills: Scripting for Automation

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am–12:30 pm

Thurgood Marshall North Ballroom

Automation is critical to surviving your system administration career with your sanity, hair, and systems intact. If you can automate some or all of a task, then you stand to make considerable gains in personal productivity, task repeatability, and system predictability.

So how can you achieve this state of nirvana? The answer is scripting.

This class is a practical crash course in how—using a combination of bash, Perl, and friends—you can write useful scripts that solve real-world system administration problems.

Please note that this is a hands-on class. A basic understanding of programming ("What's a loop?") and how to edit files in your favorite flavor of •nix are assumed. Attendees will need to bring a laptop with OS X, Linux, or FreeBSD installed to complete in-class tasks.

If there is sufficient interest, we will schedule a BoF for scripting challenges, where we can work them out interactively as a group.

Who should attend:

Junior and intermediate sysadmins who are new to scripting or would like to create scripts to reliably automate sysadmin tasks.

Topics include:

An understanding of how to apply standard utilities in your scripts, along with recipes for automating typical administration tasks.

Additional Materials:

Back to Top

David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

Read more about David Nalley, Apache CloudStack

M4

Building a Big IaaS Cloud: Building a Massively Scalable, Geographically Disparate, IaaS Cloud Using Apache CloudStack

David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

9:00 am–12:30 pm

Thurgood Marshall South Ballroom

In this half-day class, we'll give system administrators an understanding of IaaS clouds, with the reference implementation being Apache CloudStack. We'll cover everything from initial configuration to scaling and maintaining availability.

Who should attend:

Intermediate to advanced sysadmins or enterprise architects wanting to deploy a production Infrastructure as a Service cloud. Experience with virtualization platforms and a deep understanding of L2/L3 networking are preferred but not required.

Take back to work:

What you need to deploy an IaaS cloud, based on Apache CloudStack, in an enterprise environment.

Topics include:

Deploying Apache CloudStack
Next-generation network topologies, including SDN
Scaling storage without becoming indentured to SAN vendors
Making CloudStack scale to tens of thousands of physical nodes
Maintaining availability in a "failure-is-assured" environment

Additional Materials:

Back to Top

Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

Read more about Adam Moskowitz

M5

How to Interview a System Administrator

Updated!

Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

9:00 am–12:30 pm

Harding Room

Do you know how to interview a system administrator? Do the questions you ask elicit specific, narrowly-focused information, or do they show you both the depth and breadth of a candidate's knowledge of a particular subject or technology? Do you know how to distinguish between a candidate who is just trying to bluff through the interview and one who has some knowledge of the field but hasn't yet become an expert? Are trick questions ever appropriate, and if so, when and why? Some questions shouldn't be asked, and some would even land you in hot water with your company's HR or legal department: do you know what those questions are? Finally, have you figured out how to help a candidate do well in an interview while still getting an objective and fair assessment of their skills?

If you answered "no" or even "I'm not sure" to any of these questions, this course is for you.

This tutorial will focus on techniques for interviewers, but even sysadmins who are just starting out will learn some things to use as an interviewee. Managers of system administrators and junior sysadmins will learn, among other things, how to interview someone who knows more than you do. Junior administrators will also learn how to respond (as an interviewee) when asked a bad question, in particular, how to turn it into a better question.

Who should attend:

System administrators of all levels of experience, as well as managers of system administrators.

Take back to work:

Increased confidence in your ability to weed out the posers and hire really great sysadmins.

Topics include:

Purposes of an interview
- To assess the candidate's technical skills
- To get a feel for the candidate's personality and interpersonal skills
- To learn whether a candidate is likely to be a good fit with the company and with the IT group
- To help the candidate figure out whether he wants this job and whether he is likely to do well in the position
- Maybe even to teach the candidate something new about system administration
Basic questions to bear in mind
- Is the candidate comfortable?
- Does he need a drink or a bathroom break?
- Does she know who you are and what your role in the company is?
Preparatory questions
- What are you really trying to learn about the candidate's skills, and why?
- What makes a good question good?
- What makes a bad question bad?
- How can you turn bad questions into good ones?
- When is it appropriate to ask a trick question, and why?
- What questions can't or shouldn't you ask?

Additional Materials:

Back to Top

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

Read more about Nicole Forsgren Velasquez, Utah State University

Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

Read more about Carolyn Rowland, NIST

M6

Navigating the Business World for Sysadmins: The Trusted Adviser

New!

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

9:00 am–12:30 pm

Thurgood Marshall East Ballroom

This tutorial provides guidelines and suggestions to improve the communication and collaboration of sysadmins and directly increase effectiveness when working with senior management.

This course is intended to provide you with skills that are associated with senior IT staff. The focus is on understanding the gap between IT operations and management and how to bridge that gap to increase respect for IT and improve your relationship with management. We will cover identifying your role within the organization, effective communication with upper management, positioning yourself to increase your visibility, and becoming the go-to person/group for IT within your organization.

Who should attend:

Sysadmins who want to learn specific skills that allow them to work more effectively with senior management within the organization.

Take back to work:

The ability to identify the management team relevant to IT in your organization; to plan branding, both individually and for your team; to compose effective emails that get results; to communicate effectively IT priorities and goals to management; to improve communication within the IT group; to translate business goals into IT priorities; to work collaboratively with management; to plan and conduct effective and efficient meetings; to return to work and start building that trusted IT adviser relationship.

Topics include:

Examining the divide between management and IT
Defining "management" within your organization
How to communicate effectively between management and IT operations
Demonstrating the value of your work in a way that management will understand
Ways to communicate the benefits of supporting a strong IT presence
Knowledge and information management
How to develop a collaborative relationship with your management that enables both sides to be successful

Additional Materials:

Back to Top

Half Day Afternoon

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

M7

Advanced Shell Programming

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm–5:00 pm

Thurgood Marshall North Ballroom

The humble shell script is still a mainstay of UNIX/Linux system administration scripting languages, despite the wide availability of alternatives such as Perl, Python, TCL, and other similar languages. This class details techniques that move beyond the quick-and-dirty shell script.

Who should attend:

Intermediate system administrators or anyone with a solid knowledge of programming and with some experience in Bourne/Korn shells or their derivatives.

Take back to work:

An understanding of how to use the "lowly" shell to achieve lofty goals.

Topics include:

Common mistakes and unsafe practices
Modular shell script programming
Building blocks: awk, sed, etc.
Writing secure shell scripts
Performance tuning
Choosing the right utilities for the job
Addressing portability at the design stage
When not to use shell scripts

Additional Materials:

Back to Top

David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

Read more about David Nalley, Apache CloudStack

Chiradeep Vittal, Citrix Systems

Chiradeep Vittal is Principal Architect in the Cloud Platforms Group at Citrix Systems. He is a maintainer in the Apache CloudStack project where he contributes to networking and storage parts of the Infrastructure-as-a-Service (IAAS) management system. He was a founding engineer at Cloud.com whose product CloudStack is now incubating at the Apache Software Foundation. CloudStack is deployed in more than 100 public and private clouds and powers some of the largest clouds in the world today.

Read more about Chiradeep Vittal, Citrix Systems

M8

Networking in the Cloud Age

David Nalley, Apache CloudStack

David Nalley is a recovering systems administrator of 10 years. David is a member of the Apache Software Foundation, and a Project Management Committee Member for Apache CloudStack. David is a frequent author for development, sysadmin, and Linux magazines and speaks at numerous IT conferences.

Chiradeep Vittal, Citrix Systems

Chiradeep Vittal is Principal Architect in the Cloud Platforms Group at Citrix Systems. He is a maintainer in the Apache CloudStack project where he contributes to networking and storage parts of the Infrastructure-as-a-Service (IAAS) management system. He was a founding engineer at Cloud.com whose product CloudStack is now incubating at the Apache Software Foundation. CloudStack is deployed in more than 100 public and private clouds and powers some of the largest clouds in the world today.

1:30 pm–5:00 pm

Thurgood Marshall South Ballroom

Networking has been relatively static for decades. We've seen increases in speed, but many of the traditional topologies are inherently limited. Innovative networks are quite different. If you look at public services, such as AWS, or large private cloud deployments, you see that their networking topology looks contrary to everything that's been standard for years. In this half-day class we'll reexamine what limitations there are and what innovative options exist to remove those limitations.

Who should attend:

Advanced system or network admins with a deep understanding of L2/L3 networking who want to learn about new networking technologies that are enabling scaling networks.

Take back to work:

Knowledge of emerging networking standards and where they are best used.

Topics include:

How massive public and private clouds build their networks to ensure scalability
How software defined networks work
Technologies worth looking at: VXLAN, NVGRE, GRE

Additional Materials:

Back to Top

Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

Read more about Adam Moskowitz

M9

Getting to Senior: Sysadmin Level IV and Beyond

New!

Adam Moskowitz

When he's not teaching LISA attendees how to conduct interviews or become a senior system administrator, Adam works as a software developer and senior system administrator, but only to support his hobby of judging barbecue contests and to keep food in his puppy's bowl.

1:30 pm–5:00 pm

Harding Room

The USENIX Short Topics in System Administration book Job Descriptions for System Administrators is a fine work, and one might think that it could be used as a roadmap by sysadmins looking to advance their careers. However, the book limits itself to the words that might appear in a job posting, and is necessarily concise. Furthermore, the book emphasizes specific technical skills without much emphasis on the myriad "soft" skills a senior admin needs. A good senior system administrator needs to know more than just what's in the job description, and needs to know why these additional skills are required.

So what's a junior or mid-level sysadmin to do? Why, take this tutorial, of course!

Adam will go far beyond just reading the requirements listed in the book by providing an analysis of what is meant by the requirements and why a broad range of skills is often necessary to meet a single requirement. "Hard" skills will be balanced with "soft" skills such as professional ethics, job performance, and conduct in the workplace. To some attendees' dismay, Adam will also explain why a senior system administrator needs to be good at talking to both the least skilled users and the highest executives, and why being able to talk to these people is even more important for the most senior administrators than for the most junior.

Who should attend:

Junior and mid-level sysadmins who want advice and direction on how to advance their careers.

Take back to work:

The skills you need to learn, as well as an understanding of why you need to learn them and why many of those skills have more to do with business and management than system administration; ideas for how to go about learning what you need beyond this course.

Topics include:

"Hard" skills, "soft" skills
The LISA Job Description Levels
- Working definitions
- Aim of this talk in terms of these definitions
"Hard" skills
- "System" knowledge
  - User commands
  - Administrative commands
  - Boot process, run levels, and init files
  - Technologies (RAID, back-ups, etc.)
    - Theory
    - Implementation
  - Configuration management (systems)
  - Networking
    - Hardware
    - Protocols
    - Applications
  - The kernel
    - Layout/data structures
    - Filesystems
    - Devices
- Programming skills
  - Shell (and awk and sed and...)
  - Perl
  - Python, Ruby, and others
  - C and maybe assembler
  - Basic S/W engineering
    - Revision control
    - Configuration management (software)
  - Knowing when to get help
- Standards
  - What's on your bookshelf
  - "Understanding the system"
"Soft" skills
- The real role of a sysadmin
- Understand business
- Attitude and professionalism
  - Interacting with customers
- Professional growth
  - Learning
  - Getting help
  - Conferences and training
Going beyond Sysadmin Level IV
- "Slightly squishy" (not hard, not soft—or both)
- Project management
- System design and analysis
  - Technical
  - Business (cost/benefit)
- Budgeting
- More interacting with customers
- Interacting with management
- Professional growth (again)

Additional Materials:

Back to Top

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

Read more about Nicole Forsgren Velasquez, Utah State University

Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

Read more about Carolyn Rowland, NIST

M10

Navigating the Business World for Sysadmins: Methods

New!

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals and has served as co-chair of WiAC '12, WiAC '13, and CHIMIT '10, as well as on several LISA program committees. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government.

Nicole holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is a member of USENIX, ACM, AIS, AAA, LOPSA, and CSST. She is currently an Assistant Professor at Utah State University and her public work includes technical white papers, a patent, newsletter articles, and academic research papers. She has been a featured speaker at industry and academic events and was involved in the organization of the Silicon Valley Women in Tech group.

Carolyn Rowland, NIST

Carolyn Rowland began working with UNIX in 1986; her professional career as a UNIX system administrator took off in 1991. She currently leads a team of sysadmins at the National Institute of Standards and Technology (NIST), driving innovation and supporting research projects for the Engineering Laboratory. She believes we need to increase the visibility of system administration by engaging senior leadership and tying our efforts to the priorities of business. Carolyn is also Secretary of the USENIX Board of Directors and the LISA '12 Program Chair. In 2011–2012 Carolyn began to focus on the community of women in computing, including co-chairing the first-ever USENIX Women in Advanced Computing (WiAC) Summit. She continued as co-chair of WiAC in 2013.

1:30 pm–5:00 pm

Thurgood Marshall East Ballroom

This tutorial provides guidelines and suggestions to support core business processes and directly increase effectiveness when working with senior management.

Who should attend:

Sysadmins who want to learn specific skills that allow them to work more effectively with senior management within the organization. This course is intended to provide you with skills that are associated with senior IT staff.

Take back to work:

The ability to write a basic capital proposal, including net present value analysis; to draft a budget, complete with impacts; to understand how to sell a technical concept to management and other non-techies; to prepare basic rationale for a technical capability with reference to a specific organizational goal; and to understand how metrics are used by management, as well as identify and calculate relevant metrics.

Topics include:

Navigating the capital expenditure process: understanding the capital cycle, how to write a capital proposal, and basic net present value analysis
Budgeting 101: the importance of the budget, how to draft a budget, selling your budget to management, and the importance of including impacts
How to sell an IT concept to non-IT people (e.g., agile development, DevOps, cloud computing)
Positioning IT as a valuable asset to the organization: understanding your organization's core business and linking the contribution of IT to that business (e.g., if security is a primary focus for the company, ensure that IT is protecting the organization's critical assets)
Untangling IT metrics: the importance of metrics, selecting the right metric, developing your own, and using metrics effectively
Managing resources: projecting resource allocation within IT, prioritization and strategic planning, and big-picture view of IT's contributions

Additional Materials:

Back to Top

Tuesday, November 5, 2013

Full Day

Guido Trotter, Google

Guido Trotter a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

Read more about Guido Trotter, Google

Helga Velroyen, Google

Helga has been a Google Software Engineer since 2011 and a member of the Ganeti core team since 2012. Inside Ganeti she implemented better network support, enhancements of storage handling, and various bug fixes. In her life before Google she worked at a biotech company, bringing medical analysis robots to life. She is a member of the German Chaos Computer Club and has given talks at their conferences, including one at 28c3 about hearing aid technology and one at the OpenChaos Cologne about C#/.Net. At university her focus was on data-mining and pattern recognition; she participated successfully in various data-mining competitions.

Read more about Helga Velroyen, Google

T1

Ganeti: Your Private Virtualization Cloud "the Way Google Does It"

Updated!

Guido Trotter, Google

Guido Trotter a Senior Systems Engineer at Google, has worked as a core Ganeti developer and designer since 2007. He is also a regular conference speaker, having presented at LISA, Fosdem, Linuxcon, Debconf, and other open source and community gatherings. He mostly speaks about Ganeti, virtualization in the open source world, and Linux networking features for virtualized environments.

Helga Velroyen, Google

Helga has been a Google Software Engineer since 2011 and a member of the Ganeti core team since 2012. Inside Ganeti she implemented better network support, enhancements of storage handling, and various bug fixes. In her life before Google she worked at a biotech company, bringing medical analysis robots to life. She is a member of the German Chaos Computer Club and has given talks at their conferences, including one at 28c3 about hearing aid technology and one at the OpenChaos Cologne about C#/.Net. At university her focus was on data-mining and pattern recognition; she participated successfully in various data-mining competitions.

9:00 am–5:00 pm

Thurgood Marshall South Ballroom

Ganeti is a cluster virtualization system developed mostly at Google but used by many organizations worldwide. Businesses and groups can leverage the power of Ganeti to easily and effectively manage fleets of physical hosts and use them to schedule virtual machine guests.

Reasons for choosing Ganeti include that it is very lightweight, it is simple to install and manage, and it doesn't demand special storage hardware.

Who should attend:

System engineers interested in using virtualization and cloud technologies efficiently to consolidate systems and decouple physical hardware resources from virtual systems. Ideal participants are proficient with Linux/UNIX system administration and may already be using some virtualization technologies, but want to achieve a higher level of scalability for their systems by employing a cluster management technology such as Ganeti, without the need to invest money in specialized hardware resources such as SANs.

Take back to work:

The knowledge needed to create and maintain your own Ganeti cluster, to provide an IaaS cloud or virtualized services.

Topics include:

Setting up and managing a Ganeti cluster
Ganeti internals: how to make changes
Monitoring your cluster and dealing with failure
Ganeti as a back end
Typical and atypical use cases

Additional Materials:

Back to Top

Half Day Morning

Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

Read more about Geoff Halprin, The SysAdmin Group

T2

The First Hundred Days

Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

9:00 am–12:30 pm

Wilson C Room

Starting at a new company gives you exactly one chance to do things differently, to establish a new approach, realign the team's efforts, and change the tenor of the discussion. To be effective in your new role as leader, you must establish a rapport with the other departments and with your team and take control of the work pipeline.

You've made the leap. You're about to start at a new company, maybe as a senior sysadmin, maybe as a team leader or manager. Now you're asking yourself, "What do I do first?"

Moving to a different company or being promoted internally gives us a unique opportunity to put aside the history of a site and take a new look with a clean set of eyes. If you're hired as the new team lead or manager, then knowing how you're going to get to know the new site and how you're going to get on top of any site consolidation activities is critical to your longevity in the role.

This class discusses the various aspects of moving from a tactical (bottom-up) view of system administration to a strategic (top-down) view. We cover the initial site survey, the first steps of identifying and cauterizing open wounds, and the process of systemic review and improvement.

Who should attend:

Anyone starting or contemplating a new position, including making an in-house move that enables you to start over; anyone with a new boss who wants to understand and help that boss; anyone about to apply for a senior position who wants to take control of the interview process.

Take back to work:

A set of tools and perspectives that will help you evaluate an environment, from company structure and pain points to IT systems and team skills, and help you engage a team in improvements to the department.

Topics include:

Part 1: A topical view

Organizational awareness: The boss, the company
The team: Assessing the team; first repairs
Building the work pipeline; second repairs
Systems and processes: Workflow management, change management, event management
Round 1: Cauterizing the wound
Round 2: Some early wins
Round 3: The Big Three
Systemic improvement programs

Part 2: A temporal view

The first day
The first week
The first month
The first hundred days

Additional Materials:

Back to Top

Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

Read more about Jacob Farmer, Cambridge Computer Services

T3

Next-Generation Storage Networking: Beyond Conventional SAN and NAS

Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

9:00 am–12:30 pm

Harding Room

There has been tremendous innovation in the data storage industry over the past few years. New storage architectures have come to market to challenge traditional SAN and NAS products with nimble new designs that are much better suited to serving the increasingly virtual nature of applications and server infrastructure. Meanwhile, the allure of cloud computing and the emergence of affordable enterprise-class solid state storage devices have inspired ever more innovative approaches to storage caching, tiering, and deduplication. This lecture is a survey of the latest trends and advances in the data storage industry. We trace the I/O path from application to storage media and look at a wide variety of solutions to the ever-changing challenges of data storage.

Who should attend:

System administrators running day-to-day operations, enterprise architects, storage administrators. This tutorial is technical in nature, but it does not address command-line syntax or the operation of specific products or technologies. Rather, the focus is on general architectures different ways to tackle various storage management challenges.

Take back to work:

A better understanding of modern storage architectures, various approaches to scaling in both performance and capacity, and a framework for comparing and contrasting various types of storage solutions.

Topics include:

The storage I/O path and the fundamentals of storage virtualization
Application acceleration with solid state storage devices (SSDs)
Automated tiered storage and information life cycle management (ILM)
Deduplication of primary storage
Object storage models and content-addressable storage
Leveraging the cloud for primary storage

Additional Materials:

Back to Top

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

Read more about Theodore Ts'o, Google

T4

Recovering from Linux Hard Drive Disasters

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am–12:30 pm

Thurgood Marshall North Ballroom

Ever had a hard drive fail? Ever kick yourself because you didn't keep backups of critical files, or you discovered that your regularly nightly backup didn't succeed?

Of course not: everybody keeps regular backups and verifies them to make sure they are successful. But for those people who think they might nevertheless someday need this information, this tutorial will discuss ways of recovering from storage disasters caused by failures somewhere in the hardware or software stack.

Who should attend:

Linux system administrators and users.

Take back to work:

How to recover from storage disasters caused by failures somewhere in the hardware or software stack.

Topics include:

How data is stored on hard drives
Recovering from a corrupted partition table
Recovering from failed software RAID systems
Low-level techniques to recover data from a corrupted ext2/3/4 filesystem when backups aren't available
Using e2image to back up critical ext2/3/4 filesystem metadata
Using e2fsck and debugfs to sift through a corrupted filesystem
Preventive measures to avoid needing to use heroic measures

Additional Materials:

Back to Top

Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

Read more about Thomas A. Limoncelli, Stack Exchange

T5

Advanced Time Management: Team Efficiency

Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

9:00 am–12:30 pm

Thurgood Marshall West Ballroom

Attendees must bring a laptop for in-class exercises. Techniques will apply to any wiki or collaborative document system, but the labs and demos will be done using Google Apps.

Who should attend:

All sysadmins who want to collaborate efficiently within their team and with others (even solo sysadmins will benefit!).

Take back to work:

Techniques to help your IT team work better, faster, and more transparently.

Topics include:

Collaborate online with collaborative documents, shopping lists, and Kanban tools.
Go from "firefighting" to working on more interesting projects
Create meetings that don't suck
Hold team members accountable for tasks they take on
Cross train team members more effectively with fire drills (DevOps "Game Day" exercises)
Train people on new services more effectively
Eliminate problems due to inconsistent processes
Send more effective emails
Launch new services without "oops"
Document how things work so everyone can go on vacation

Additional Materials:

Back to Top

Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

Read more about Joshua Jensen, Sirius Computer Solutions

T6

Continuous Integration with Jenkins

New!

Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

9:00 am–12:30 pm

Thurgood Marshall East Ballroom

Does your software development project need a butler? Is your development team guilty of writing great code while only sporadically testing it? Does pushing your code-base to production make your programmers call in sick? Enter Continuous Integration with Jenkins. Learn how this open source project can improve your development by consistently and continuously automating your software project tasks.

Who should attend:

Junior and intermediate DevOps administrators in need of sanity when testing and deploying software.

Take back to work:

Real-world usage of Jenkins for better automation of testing and deployment with your team's software project.

Topics include:

Continuous Integration concepts and best practices
Jenkins introduction and configuration
Jobs, job history, and build artifacts
SCM integration and job triggering
Jenkins cluster farming: multi-node management made easy
Authentication considerations
Jenkins plugins
Distributed builds for heterogeneous architectures
Build/test/deploy pipelines

Additional Materials:

Back to Top

Half Day Afternoon

Jeanne Schock, Afilias

Jeanne has a background in system administration, working with FreeBSD, Solaris, Linux, and Windows. She has worked at a number of companies in the Internet industry, including a regional ISP, a video hosting company, and, for the past six years, Afilias, a Registry and DNS service provider. In her current role of Service Delivery Manager at Afilias, she is owner and manager for key ITIL processes including continual service improvement, release and deployment management, change management, problem management, and incident management. She also manages projects geared toward site reliability and disaster recovery. Jeanne got her BA from Vassar College and did graduate work at the University of Toronto in Medieval European studies, teaching Latin in her pre-IT years.

Read more about Jeanne Schock, Afilias

T7

Disaster Recovery Plans: Design, Implementation and Maintenance Using the ITIL Framework

New!

Jeanne Schock, Afilias

Jeanne has a background in system administration, working with FreeBSD, Solaris, Linux, and Windows. She has worked at a number of companies in the Internet industry, including a regional ISP, a video hosting company, and, for the past six years, Afilias, a Registry and DNS service provider. In her current role of Service Delivery Manager at Afilias, she is owner and manager for key ITIL processes including continual service improvement, release and deployment management, change management, problem management, and incident management. She also manages projects geared toward site reliability and disaster recovery. Jeanne got her BA from Vassar College and did graduate work at the University of Toronto in Medieval European studies, teaching Latin in her pre-IT years.

1:30 pm–5:00 pm

Wilson C Room

The only good disaster recovery plan is the one that actually works when you need it. How do you ensure readiness? How do you know that your plan is aligned with the business objectives of your company? Are you having trouble getting buy-in from management to fund your design? How do you prevent both your disaster plan and design from being out of date just weeks after implementation? This tutorial will provide a step-by-step framework for implementing a DR project through the ITIL Lifecycle approach to IT Service Continuity Management. I will demonstrate how to create a process for ongoing management of your disaster recovery capabilities and to ensure that you are proactively improving your plan and design.

Who should attend:

System administrators and managers who are responsible for disaster planning and ensuring that the plan is ready when disaster strikes, whether you have a current strategy in place or are starting from scratch.

Take back to work:

A step-by-step framework for designing and implementing your DR strategy, and for making sure that your plan is ready when you need it.

Topics include:

Service Continuity Management: Introduction
- What's a process
- What are the benefits for DR
- Every process should have an explicit, documented purpose
- What you need to start and how it fits into actually implementing DR
Lifecycle approach to IT Service Continuity Management
- Initiation: establish management intention, define policies, define scope, agree on objectives, initiate the project
- Requirements and strategy
- Implementation
- Ongoing operation and maintenance
Ensuring ongoing support as well as continual improvement
- Embed an element of continual improvement to ensure that you are always looking at ways to improve technologies, procedures, capabilities, costs
- Process considerations
- Service Continuity and related processes: availability, capacity, and IT security management

Additional Materials:

Back to Top

Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

Read more about Jacob Farmer, Cambridge Computer Services

T8

Backups, Archiving, and Life Cycle Management: Riding the Wave of Data Proliferation

Jacob Farmer, Cambridge Computer Services

Jacob Farmer is an industry-recognized expert on storage networking and data protection technologies. He has authored numerous papers and is a regular speaker at major industry events such as Storage Networking World, VMworld, Interop, and the USENIX conferences. Jacob's no-nonsense, fast-paced presentation style has won him many accolades. Jacob is a regular lecturer at many of the nation's leading colleges and universities. Recently he has given invited talks at institutions such as Brown, Columbia, Cornell, Carnegie Mellon, Duke, Harvard, and Yale. Inside the data storage industry, Jacob is best known for having authored best practices for designing and optimizing enterprise backup systems and for his expertise in the marketplace for emerging storage networking technologies. He has served on the advisory boards of many of the most successful storage technology startups. Jacob is a graduate of Yale. Follow him on Twitter @JacobAFarmer.

1:30 pm–5:00 pm

Harding Room

Most IT organizations report exponential data growth over time, and whether your data doubles every year, every two years, or every five years, the simple fact remains that if your data capacities double, then both the capacity and the performance of your backup system must double. All of this doubling stresses traditional approaches to data management. Thus, it is no surprise that backup/recovery is one of the most costly and unforgiving operations in the data center. Meanwhile, most IT organizations also report that the vast majority of their unstructured data is seldom or never accessed. Files accumulate year after year, choking the backup systems and driving up costs.

This course explores two main ways to manage the data deluge: (1) optimize backup systems by eliminating bottlenecks, streamlining operations, and bulking up backup infrastructure; and (2) manage the life cycles of unstructured data so that files that are not in active use can be managed separately from files that are in active use. We start by offering a simple framework for defining business requirements and comparing solutions at a high level. We then delve into the various mechanisms for lifecycle management and for eliminating backup system bottlenecks. Some time is spent exploring storage systems that have built-in mechanisms for data protection and lifecycle management.

Who should attend:

System administrators involved in the design and management of backup systems and policymakers responsible for protecting their organization's data.

Take back to work:

Ideas for immediate, effective, inexpensive improvements to your backup systems and a vision for how you might deploy a lifecycle management system that fits your organization.

Topics include:

Formulating strategies for data protection and lifecycle management
Identifying and addressing backup system bottlenecks
Managing fixed content
Hierarchical storage management and data migration
In-band versus out-of-band approaches to file lifecycle management
Breathing new life into tape storage
Deduplication: separating hype from reality
Object-based storage models for backup and archiving
Self-healing and self-protecting storage systems
Leveraging the cloud for backup and archiving

Additional Materials:

Back to Top

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

T9

Documentation Techniques for Sysadmins

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm–5:00 pm

Thurgood Marshall North Ballroom

Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

Who should attend:

System administrators who need to produce documentation for the systems they manage.

Take back to work:

The ability to make immediate, practical use of these documentation techniques.

Topics include:

Why system administrators need to document
The document life cycle
Targeting your audience
An adaptable document framework
Common mistakes in documenting
Tools to assist the documentation process

Additional Materials:

Back to Top

Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

Read more about Thomas A. Limoncelli, Stack Exchange

T10

Evil Genius 101

New!

Thomas A. Limoncelli, Stack Exchange

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best-known books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Addison-Wesley). He received the SAGE 2005 Outstanding Achievement Award. He works at Stack Exchange in NYC.

1:30 pm–5:00 pm

Thurgood Marshall West Ballroom

You want to innovate: deploy new technologies such as configuration management (CFEngine, Puppet, Chef), a wiki, or standardized configurations. Your coworkers don't want change. They like it the way things are. Therefore, they consider you evil. However you aren't evil; you just want to make things better.

Who should attend:

Anyone who wants to improve processes and learn about managing change.

Take back to work:

The communication, analysis, and persuasion skills you need to make your workplace better.

Topics include:

Helping your coworkers understand and agree to your awesome ideas
Convincing your manager about anything—really
Turning the most stubborn user into your biggest fan
Getting others to trust you so they are more easily convinced
Deciding which projects to do when you have more projects than time
Making decisions based on data and evidence
Driving improvements based on a methodology and planning instead of guessing and luck

Additional Materials:

Back to Top

Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

Read more about Joshua Jensen, Sirius Computer Solutions

T11

High-Availability Linux Clustering

Joshua Jensen, Sirius Computer Solutions

Joshua Jensen has been working with Linux for 15 years, and is a Senior Open Source Architect for Sirius Computer Solutions. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

1:30 pm–5:00 pm

Thurgood Marshall East Ballroom

We are evolving from a single service running on a single server, complete with multiple points of hardware and software failure... but to what? With services and platforms that need to be restored before our pagers or Bat Phones ring, the system administrators of today have a need for high availability. Starting with the current realities of modern data centers, this half-day tutorial will explore practical uses of Linux clusters.

Who should attend:

Linux administrators who are planning on implementing a multi-service fail-over cluster implementation in a production environment. Course attendees should be familiar with the basics of system administration in a Linux environment. At no point will the word "cloud" be used, although novice administrators and gurus alike should leave the tutorial having learned something.

Take back to work:

The knowledge and ability to create and administer highly available services and filesystems on a Linux cluster.

Topics include:

Linux HA Cluster technology: Corosync, OpenAIS, rgmanager, Conga
Data management with shared disk implementations: SAN, iSCSI, AoE, FCoE
Node fencing with STONITH
Network power switches and IPMI
Clustered logical volume management
GFS2 filesystems with Distributed Lock Manager (DLM)
Service management with failover domains
Virtual machines as a cluster service
Cluster administration with luci
Working with cluster-unaware services

Additional Materials:

Back to Top

Wednesday, November 6, 2013

Full Day

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

Read more about Theodore Ts'o, Google

W1

Linux Performance Tuning

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am–5:00 pm

Lincoln 4 Room

The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workloads on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:

Strategies for performance tuning
Characterizing your workload's requirements
Finding bottlenecks
Tools for measuring system performance
Memory usage tuning
Filesystem and storage tuning
NFS performance tuning
Network tuning
Latency vs. throughput
Capacity planning
Profiling
Memory cache and TLB tuning
Application tuning strategies

Additional Materials:

Back to Top

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

Read more about Stuart Kendrick, Fred Hutchinson Cancer Research Center

W2

Root Cause Analysis—Beginner

Updated!

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

9:00 am–5:00 pm

Hoover Room

This version of the class is aimed at the mid-level sysadmin. You manage servers and/or network gear, look at packet traces, poke through logs—but wouldn’t consider yourself an expert at any of this. You want a chance to tackle the problem on your own, then want guided practice on technique: analyzing a packet trace for performance problems, extracting insights from trending charts, correlating log entries from multiple devices. In this version of the class, we spend time together reviewing concepts (e.g., caching and spindles), applying techniques (e.g., Wireshark features), asking questions (e.g., TCP, SMB, and NFS). In addition to the technical contributors, each team will need a problem manager—perhaps a senior engineer, perhaps a resource or project manager comfortable with coordinating teams of techs.

Troubleshooting is hard. In hindsight, the answer to a problem is often obvious, but in the chaos and confusion of the moment—with too much data flowing in, time pressure, misleading clues—slicing through the distractions and focusing on the key elements is tough. This is a hands-on seminar: you will work through case studies taken from real-world situations. We divide into groups of 5–7, review a simplified version of Advance7′s Rapid Problem Resolution (RPR) methodology, and then oscillate on a half-hour cycle between coming together as a class and splitting into groups. During class time, I will describe the scenario, explain the current RPR step, and offer to role-play key actors. During group time, I will walk around, coaching and answering questions.

The course material includes log extracts, packet traces, strace output, network diagrams, Cacti snapshots, and vendor tech support responses, all taken from actual RCA efforts. Preview the deck to get a feel for how your day will look. BYOL (Bring Your Own Laptop) for some hands-on, interactive, team-oriented, real-world puzzle solving.

Who should attend:

System administrators and network engineers tasked with troubleshooting multidisciplinary problems; problem managers and problem analysts wanting experience coordinating teams.

Take back to work:

Practice in employing a structured approach to analyzing problems that span multiple technology spaces.

Topics include:

Case studies:

Remote Office Bumps: A remote office ties back to the campus via a 10MB circuit. Intermittently, opening documents on the campus-based file-server is slow, printing is slow, Exchange appointments vanish…
Many Applications Crash: Outlook crashes, Word documents fail to save, Windows Explorer hangs: The office automation applications servicing ~1500 users intermittently report a range of error messages; users reboot their machines. Some days are fine, other days are terrible, and the symptoms are worsening…

Additional Materials:

Back to Top

Nathen Harvey, Opscode

Nathen Harvey is a Technical Community Manager at Opscode, the company behind Chef. Nathen is the co-organizer of DevOps DC and the Washington DC MongoDB Users' Group, and co-host of the Food Fight Show, a podcast about Chef and DevOps. Like many others who blog, Nathen updates his blog on a very irregular basis. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and Foursquare.

Read more about Nathen Harvey, Opscode

W3

Introduction to Chef

New!

Nathen Harvey, Opscode

Nathen Harvey is a Technical Community Manager at Opscode, the company behind Chef. Nathen is the co-organizer of DevOps DC and the Washington DC MongoDB Users' Group, and co-host of the Food Fight Show, a podcast about Chef and DevOps. Like many others who blog, Nathen updates his blog on a very irregular basis. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and Foursquare.

9:00 am–5:00 pm

Washington 5 Room

This full-day tutorial will provide the attendee with a hands-on introduction to the Chef configuration management system for performing common automation tasks. Each exercise will be instructor-led, and introduce new Chef concepts along the way. We'll cover the Anatomy of a Chef Run, Chef's Authentication Cycle, how to build roles, manipulate configuration through data in attributes, use Chef's search API for dynamic configuration, and more.

Who should attend:

Sysadmins interested in using Chef to meet their configuration management needs.

Take back to work:

Hands-on experience configuring Chef and writing Chef cookbooks. The mixture of tutorial and hands-on teaching in this course gives attendees real exposure to Chef concepts, software, and configuration. The material in this course is intended to bootstrap the attendees' knowledge and provide them with the ability to immediately work with Chef outside of the course.

Topics include:

Set up a local workstation with Chef and connect to a Chef server
Write your first Chef cookbook
Use Chef to automate installation of a Nagios server as a real world example
Automate some common system tasks with Chef

Additional Materials:

Back to Top

Thursday, November 7, 2013

Full Day

Dave Beazley, Dabeaz LLC

David Beazley is the author of the Python Essential Reference and has been an active member of the Python community since 1996. He is most widely known for creating several Python-related open-source packages, including SWIG (a tool for building C/C++ extension modules to Python) and PLY (a Python version of the lex/yacc parsing tools). In the 1990s, while working at Los Alamos National Laboratory, he helped pioneer the use of Python on massively parallel supercomputers. From 1998 to 2005, he was an assistant professor at the University of Chicago, where he taught courses on operating systems, networks, and compilers. Dave is currently the owner of Dabeaz LLC, a company specializing in Python software development and training courses.

Read more about Dave Beazley, Dabeaz LLC

R1

The Python Programming Language

Dave Beazley, Dabeaz LLC

David Beazley is the author of the Python Essential Reference and has been an active member of the Python community since 1996. He is most widely known for creating several Python-related open-source packages, including SWIG (a tool for building C/C++ extension modules to Python) and PLY (a Python version of the lex/yacc parsing tools). In the 1990s, while working at Los Alamos National Laboratory, he helped pioneer the use of Python on massively parallel supercomputers. From 1998 to 2005, he was an assistant professor at the University of Chicago, where he taught courses on operating systems, networks, and compilers. Dave is currently the owner of Dabeaz LLC, a company specializing in Python software development and training courses.

9:00 am–5:00 pm

Madison AB Room

Python is a dynamic programming language that is often described as a "scripting language" along with languages such as Perl, Tcl, and Ruby. Although Python is often used for scripting, it is actually a full-featured, general-purpose programming language that supports a wide variety of imperative, functional, and object-oriented programming idioms. It also includes a large standard library that provides support for operating system interfaces, networking, threads, regular expressions, XML, GUIs, and more.

In this class we'll take a comprehensive tour of the Python programming language and see how it can be used to solve a variety of practical problems. The class will illustrate important concepts through examples that primarily focus on data analysis, systems programming, and system administration.

Who should attend:

Programmers who want to know what the Python programming language is all about and how it can be applied to a variety of practical problems in data analysis, systems administration, systems programming, and networking. Although no prior Python knowledge is required, attendees should already be experienced programmers in at least one other programming language such as C, C++, Java, or Perl. If you already know some Python, this tutorial will improve your skills.

Take back to work:

A better understanding of what makes Python tick and an increased awareness of how it can be successfully applied to real-world problems.

Topics include:

The Python Language

Basic syntax
Core datatypes
Control flow and exception handling
Functions
Generators
Coroutines
Modules
Classes and the Python object model
Decorators
C extensions

Major library modules

Text processing
Operating system interfaces
Network programming
Internet programming

Practical Programming Examples

Text parsing
Data analysis and manipulation
Processing log files
Handling real-time data streams
Controlling and interacting with subprocesses
Interacting with web services
Simple network programming
Internet data handling

Additional Materials:

Back to Top

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

Read more about Stuart Kendrick, Fred Hutchinson Cancer Research Center

R2

Root Cause Analysis—Intermediate

New!

Stuart Kendrick, Fred Hutchinson Cancer Research Center

Stuart Kendrick is an IT Architect at the Fred Hutchinson Cancer Research Center, specializing in troubleshooting, device monitoring, and transport. He started his career in 1984, writing in FORTRAN on Crays for Science Applications International Corporation; he worked in help desk, desktop support, system administration, and network support for Cornell University in Ithaca and later Manhattan. He has been in his multi-disciplinary role at FHCRC in Seattle since 1993, where he functions as ITIL problem manager/problem analyst and leads root cause analysis efforts. He is happiest when correlating packet traces with syslog extracts and writing scripts to query device MIBs.

9:00 am–5:00 pm

Hoover Room

This version of the class is aimed at the senior sysadmin. You have a decade or more experience in the industry, you are T-shaped (specialize in one or two areas but have expertise across a range of technologies), and you have accumulated numerous technical skills; now you want to deepen your meta-expertise. We will create the fog of war and then you’ll practice applying a methodology to focus your attention, working with your team to divvy up tasks, escalate key insights to each other, integrate clues from a range of sources, and produce reports for business leadership. In this version of the class, we spend more time in small groups and more time practicing communication skills than we do in the beginner version. In addition to the technical contributors, each team will need a problem manager—perhaps an unusually broad engineer, perhaps a resource or project manager comfortable with coordinating teams of techs.

Troubleshooting is hard. In hindsight, the answer to a problem is often obvious, but in the chaos and confusion of the moment—with too much data flowing in, time pressure, misleading clues—slicing through the distractions and focusing on the key elements is tough. This is a hands-on seminar: you will work through case studies taken from real-world situations. We divide into groups of 5-7, review a simplified version of Advance7′s Rapid Problem Resolution (RPR) methodology, and then oscillate on a half-hour cycle between coming together as a class and splitting into groups. During class time, I will describe the scenario, explain the current RPR step, and offer to role-play key actors. During group time, I will walk around, coaching and answering questions

The course material includes log extracts, packet traces, strace output, network diagrams, Cacti snapshots, and vendor tech support responses, all taken from actual RCA efforts. Preview the deck to get a feel for how your day will look. BYOL (Bring Your Own Laptop) for some hands-on, interactive, team-oriented, real-world puzzle solving.

Who should attend:

Sysadmins and network engineers involved in trouble-shooting multidisciplinary problems; problem managers and problem analysts wanting experience coordinating teams.

Take back to work:

Practice in employing a structured approach to analyzing problems which span multiple technology spaces.

Topics include:

Case studies:

HPC Cluster Woes: Intermittently, interactive performance on a high-performance computing cluster grinds to a halt, nodes hang, jobs vanish from the queue…
Storage Stumbles: Most of the company relies on an 800TB wide-striped storage system, with a multi-protocol (SMB, NFS, iSCSI) front-end from one manufacturer plugged into a Fibre-Channel attached back-end from another manufacturer. Intermittently, the back-end fries a disk, IO latency spikes, clients crash…

Additional Materials:

Back to Top

Lee Damon, University of Washington

Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in LISA (formerly SAGE) (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, chaired CasITconf '11 and '13, and is co-chairing CasITconf '14.

Read more about Lee Damon, University of Washington

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

R3

Seven Habits of the Highly Effective System Administrator: Hints, Tricks, Techniques, and Tools of the Trade

Lee Damon, University of Washington

Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in LISA (formerly SAGE) (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04, chaired CasITconf '11 and '13, and is co-chairing CasITconf '14.

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

9:00 am–5:00 pm

Washington 4 Room

We aim to accelerate the experience curve for junior system administrators by teaching them the tricks (and effective coping strategies) that experienced administrators take for granted and which are necessary for successful growth of both the administrator and the site.

The class covers many of the best practices that senior administrators have long incorporated in their work. We will touch on tools you should use, as well as tools you should try to avoid. We will touch on things that come up frequently, as well as those which happen only once or twice a year. We will look at a basic security approach.

Who should attend:

Junior system administrators with anywhere from little to 3+ years of experience in computer system administration. We will focus on enabling the junior system administrator to "do it right the first time." Some topics will use UNIX-specific tools as examples, but the class is applicable to any sysadmin and any OS. Most of the material covered is "the other 90%" of system administration—things every sysadmin needs to do and to know, but which aren't details of specific technical implementation.

Take back to work:

Ideas about how to improve and to streamline your systems and your workload, and, just as important, where to look to find more answers.

Topics include:

The five things every site should know
Why your computers should all agree on what time it is
Why root passwords should not be the same on every computer
Why backing up every file system on every computer is not always a good idea
Policies—where you want them and where you might want to avoid them
Ethical issues
Growth and success as a solo-sysadmin as well as in small, medium, and large teams
Training, mentoring, and personal growth planning
Site planning and roadmaps
Logistics
Books that can help you and your users

Additional Materials:

Back to Top

Friday, November 8, 2013

Full Day

James Sweeny, Puppet Labs

James Sweeny is a Professional Services Engineer for Puppet Labs. He spends most of his time flying around the world helping companies solve their systems management problems, and ranting on endless tangents about the wonders of Puppet and open source software and he one day hopes to stop receiving phantom Nagios alerts from his past life as a sysadmin. Though he calls New York City home, he is rarely there, but is happy anywhere with good food and good beer.

Read more about James Sweeny, Puppet Labs

F1

Introduction to Puppet Enterprise

New!

James Sweeny, Puppet Labs

James Sweeny is a Professional Services Engineer for Puppet Labs. He spends most of his time flying around the world helping companies solve their systems management problems, and ranting on endless tangents about the wonders of Puppet and open source software and he one day hopes to stop receiving phantom Nagios alerts from his past life as a sysadmin. Though he calls New York City home, he is rarely there, but is happy anywhere with good food and good beer.

9:00 am–5:00 pm

Washington 4 Room

This training course is intended as a one-day introduction for technical audiences who want to understand and practice the main concepts of the Puppet Enterprise, Puppet Labs' IT automation software.

Who should attend:

System administrators, IT managers, and any students who are new to Puppet Enterprise and need a better understanding of the concepts of Puppet combined with hands-on experience with basic Puppet coding and implementation.

Take back to work:

The ability to discuss the benefits of Puppet Enterprise and to practice basic system administrator competencies in using the IT automation software to manage their infrastructure effectively.

Topics include:

About Puppet technology
Why Puppet?
Modules and classes
Puppet Agent and Puppet Master
Additional Puppet concepts
Puppet Forge
Puppet Enterprise
Puppet Labs technical curriculum and certification

Additional Materials:

Back to Top

David Rhoades, Maven Security Consulting Inc.

David Rhoades is a senior consultant with Maven Security Consulting Inc. (www.mavensecurity.com). Maven Security Consulting Inc. is a Delaware corporation that provides information security assessments and training to a global clientele. David's expertise includes Web application security, network security architectures, and vulnerability assessments. Past customers have included domestic and international companies in various industries, as well as various US government agencies. David has been active in information security consulting since 1996, when he began his career with the computer security and telephony fraud group at Bell Communications Research (Bellcore). David has a Bachelor of Science degree in Computer Engineering from the Pennsylvania State University.

Read more about David Rhoades, Maven Security Consulting Inc.

Steve Pinkham, Maven Security Consulting Inc.

Steve Pinkham is a security researcher for Maven Security Consulting Inc. (www.mavensecurity.com). Steve has spent time in systems administration, programming, security research, and consulting. He dabbles in electronics and philosophy, and enjoys cheese and fine tea. If you're unlucky you might run into him wearing a backpack in some remote corner of the world. He holds a Bachelor of Science in Computer Science from Virginia Commonwealth University.

Read more about Steve Pinkham, Maven Security Consulting Inc.

F2

Hands-on Introduction to Common Hacking Tools

New!

David Rhoades, Maven Security Consulting Inc.

David Rhoades is a senior consultant with Maven Security Consulting Inc. (www.mavensecurity.com). Maven Security Consulting Inc. is a Delaware corporation that provides information security assessments and training to a global clientele. David's expertise includes Web application security, network security architectures, and vulnerability assessments. Past customers have included domestic and international companies in various industries, as well as various US government agencies. David has been active in information security consulting since 1996, when he began his career with the computer security and telephony fraud group at Bell Communications Research (Bellcore). David has a Bachelor of Science degree in Computer Engineering from the Pennsylvania State University.

Steve Pinkham, Maven Security Consulting Inc.

Steve Pinkham is a security researcher for Maven Security Consulting Inc. (www.mavensecurity.com). Steve has spent time in systems administration, programming, security research, and consulting. He dabbles in electronics and philosophy, and enjoys cheese and fine tea. If you're unlucky you might run into him wearing a backpack in some remote corner of the world. He holds a Bachelor of Science in Computer Science from Virginia Commonwealth University.

9:00 am–5:00 pm

Lincoln 3 Room

As an increasing number of valuable corporate assets are made available over computer networks, having the ability to understand the types of threats facing your organizations as well as possessing the skills required to identify and mitigate flaws on your networks are increasingly important components of an organizations' risk posture.

This full-day course will enable IT professionals to better understand the methodologies, tools, and techniques used by attackers against their technical infrastructure, with the primary aim of helping them develop better defense methods and capabilities. Attendees will learn to perform basic technical security vulnerability assessment tasks and gain a strong foundation for future studies in host and network security assessment.

The workshop will use Kali Linux™, a collection of free and open source security tools many attackers and security practitioners use. Students will have access to a network of targets that will allow them to get hands-on experience, enhancing understanding of the process and tools, and how to effectively counter them. Guidance will be provided for continued learning after the course if students wish to go on to master the tools and techniques introduced in this course.

Who should attend:

Auditors who want to understand better the methodologies, tools, and techniques used by attackers against their network and who need help developing better policy.

Take back to work:

The ability to perform basic assessment tasks and a strong foundation for future studies in host and network security assessment.

Topics include:

Discussion of current and emerging attacker methods, techniques, concepts, and tools
Setting up and using Kali Linux, a collection of security tools (Kali is the successor to Backtrack)
Understanding the basic steps an attacker uses to penetrate a network
Discussing and using tools for network mapping and analysis
Configuring and using the OpenVAS vulnerability scanner to audit network and host security
Setting up and using Metasploit exploit framework to exploit found flaws
Understanding password cracking, enabling you to craft better authentication capabilities and audit password strength
An overview of building client-side exploits and basic anti-virus evasion techniques
Understanding the common Web flaws of SQL injection and Cross Site Scripting (XSS), and demonstrating their impact on various applications and solutions
References to remediate or implement compensating controls

Additional Materials:

Back to Top

Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

Read more about Geoff Halprin, The SysAdmin Group

F3

Combined MGMT-101/MGMT-201: Effective Communication and Team Leadership for Sysadmins

New!

Geoff Halprin, The SysAdmin Group

Geoff Halprin has spent over 30 years as a software developer, system administrator, consultant, and troubleshooter. He has written software from system management tools to mission-critical billing systems, has built and run networks for enterprises of all sizes, and has been called upon to diagnose problems in every aspect of computing infrastructure and software.

He is the author of the System Administration Body of Knowledge (SA-BOK) and the USENIX Short Topics in System Administration book A System Administrator's Guide to Auditing. He was the recipient of the 2002 SAGE-AU award for outstanding contribution to the system administration profession.

Geoff has served on the boards of SAGE, SAGE-AU, USENIX, and LOPSA. He has spoken at over 20 conferences in Australia, New Zealand, Canada, Europe, and the US.

9:00 am–5:00 pm

Lincoln 2 Room

As a technical expert, you have learned how to wield the keyboard to great effect to solve technical problems. But you have also learned that this is not enough. To be effective, you may need the support of others to get funding, approve, assist, contribute to, or support your work. People can make computers seem simple.

This tutorial is about becoming a more effective system administrator through improved communication skills. In part one, we will help you understand how communication works, so that it becomes a tool you can use instead of a source of frustration.

As you grow in seniority a funny thing happens: you are expected to pass that wisdom onto others. You are given projects to run, teams to lead, apprentices to mentor, and ever larger budgets to manage effectively. The one thing, however, that you almost never receive is management training. In part two, we will turn our attention to team management. Management, like any system, can be learned. There are tools, techniques and tips that you can call on to be effective in your "organization-facing" duties.

This is an immersion course.

Who should attend:

System administrators who wish to learn tools and tips that will assist them to communicate more effectively with their managers, users, and other important constituents; system administrators who have found themselves (or are hoping, or anticipating with apprehension) being given responsibilities for "wetware systems" (i.e., other people).

Take back to work:

A bag full of tools to help you be more effective in your people-facing activities.

Topics include:

Oral communication (effective listening, effective talking, presentations)
Written communication (progress reporting, technical documentation, writing proposals, buy-vs.-build evaluations, cost-risk evaluations, audit reports)
Understanding others (understanding various communities, conflict resolution, personality types)
Time management
Risk management
Project management

Additional Materials:

Back to Top

Monday, February 17, 2014

Half Day Morning

Mark Grover, Cloudera, Inc.

Mark Grover is a committer on Apache Bigtop, a committer and PMC member on Apache Sentry (incubating) and a contributor to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume. He is also a section author of O’Reilly’s Programming Hive. Mr. Grover presently works as a software engineer at Cloudera and frequently presents on Hadoop ecosystem technologies at software conferences.

Read more about Mark Grover, Cloudera, Inc.

M1

Introduction to Apache Hadoop and Its Ecosystem

Mark Grover, Cloudera, Inc.

Mark Grover is a committer on Apache Bigtop, a committer and PMC member on Apache Sentry (incubating) and a contributor to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume. He is also a section author of O’Reilly’s Programming Hive. Mr. Grover presently works as a software engineer at Cloudera and frequently presents on Hadoop ecosystem technologies at software conferences.

9:00 am–12:30 pm

Ballroom A

Originally inspired by Google's GFS and MapReduce papers, Apache Hadoop is an open source framework offering scalable, distributed, fault-tolerant data storage and processing on standard hardware. This session explains what Hadoop is and where it best fits into the modern data center. You'll learn the basics of how it offers scalable data storage and processing, some important "ecosystem" tools that complement Hadoop's capabilities, and several practical ways organizations are using these tools today. Additionally, you'll learn about the basic architecture of a Hadoop cluster and some recent developments that will further improve Hadoop's scalability and performance.

Who should attend:

This session is intended for those who are new to Hadoop and are seeking to understand what Hadoop is, the ways that organizations are using it, and how it compares to and integrates with other systems. It assumes no prior knowledge of Hadoop, and explanations of technical topics like MapReduce and HDFS replication are clear and concise, making it appropriate for anyone attending the conference.

Topics include:

What Hadoop is and how organizations are using it
How the HDFS filesystem provides reliability and high throughput
How MapReduce enables parallel processing on large data sets
Explanations of some popular open source tools that integrate with Hadoop
Typical architecture of a Hadoop cluster
Considerations for hosting a Hadoop cluster
Emerging trends in the design and implementation of Hadoop

Additional Materials:

Back to Top

W. David Schwaderer, Silicon Valley Consultant

W. David Schwaderer has a Masters Degree in Applied Mathematics from the California Institute of Technology and a MBA from the University of Southern California. He is the former Editor-in-Chief of the VERITAS Architect Network and the Symantec Technology Network. He presently consults for Samsung Semiconductor, Inc.’s Silicon Valley Systems Architecture Lab where he assists world-class engineers develop Flash memory storage innovations that will intercept your family's future.

In all his writings and training seminars, David applies Einstein's (disputed) observation that "Everything should be made as simple as possible, but not simpler." As a multidisciplinary technologist, he has authored technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. He has seven books on innovation planned following 15 years of intense research on the topic. His 12th, and possibly last, technical book, co-authored with Jason Resch, is titled "Exabyte Data Preservation, Postponing the Inevitable."

David has presented at IEEE conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across Silicon Valley. His four innovation Google TechTalks on YouTube have recorded nearly 39,000 views. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage for Video-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

On a good day, Google Web searches for "W. David Schwaderer" indicate about 1.5 million hits. But on a bad day, it's only around 900,000.

Read more about W. David Schwaderer, Silicon Valley Consultant

M2

Erasure Code Principles Made So Simple, You'll Really Like Them

W. David Schwaderer, Silicon Valley Consultant

W. David Schwaderer has a Masters Degree in Applied Mathematics from the California Institute of Technology and a MBA from the University of Southern California. He is the former Editor-in-Chief of the VERITAS Architect Network and the Symantec Technology Network. He presently consults for Samsung Semiconductor, Inc.’s Silicon Valley Systems Architecture Lab where he assists world-class engineers develop Flash memory storage innovations that will intercept your family's future.

In all his writings and training seminars, David applies Einstein's (disputed) observation that "Everything should be made as simple as possible, but not simpler." As a multidisciplinary technologist, he has authored technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. He has seven books on innovation planned following 15 years of intense research on the topic. His 12th, and possibly last, technical book, co-authored with Jason Resch, is titled "Exabyte Data Preservation, Postponing the Inevitable."

David has presented at IEEE conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across Silicon Valley. His four innovation Google TechTalks on YouTube have recorded nearly 39,000 views. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage for Video-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

On a good day, Google Web searches for "W. David Schwaderer" indicate about 1.5 million hits. But on a bad day, it's only around 900,000.

9:00 am–12:30 pm

Ballroom B

Erasure Code storage applications (RAID 6, Object Storage, Information Dispersal, etc.) are all the rage, and deservedly so. They have intrinsic, engineering beauty and elegance that merit front-row seats in deep, advanced-technology discussions. But mastering Erasure Code principles can quickly prove challenging, if not impossible, because Erasure Coding's simple principles are typically steeped in academic obfuscation. This has historically presented impenetrable obstacles to uncounted intrepid, serious, and competent engineers—maybe even you. Luckily, that's totally unnecessary.

This presentation's goal is to arm aspiring, inquisitive engineers with Erasure Code foundational insights, intuition, and fundamental understandings that enable them to totally dominate Erasure Code discussions, both on their home courts and on their own terms.

Make no mistake: this session intends to be fun, but technically informative at a deep, visceral level. There will even be a Python programming demonstration, time allowing. Erasure Code principles likely will never be made more accessible than what you experience here. This is the Erasure Code train to catch; don't be left behind.

Topics include:

Numbers, Counting Ducks, Clubs, and Special Club Members Such as 0 and 5
Elementary School Arithmetic—Addition and Multiplication
Powers and Inverse Powers—2x2x2x2 = 2^4
Solving High School Equations—Determining Apple and Banana Prices
The Parallel Universes Around Us—Star Trek Stuff or Just GF(N)s?

Additional Materials:

Back to Top

Half Day Afternoon

Jeff Darcy, Red Hat

Jeff Darcy (S3) has worked on network and distributed storage systems for 20 years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC. He is currently a member of the GlusterFS architecture team at Red Hat and frequently gives talks and tutorials about topics related to cloud storage.

Read more about Jeff Darcy, Red Hat

M3

Building a Cloud Storage System

Jeff Darcy, Red Hat

Jeff Darcy (S3) has worked on network and distributed storage systems for 20 years, including an instrumental role in developing MPFS (a precursor of modern pNFS) while at EMC. He is currently a member of the GlusterFS architecture team at Red Hat and frequently gives talks and tutorials about topics related to cloud storage.

1:30 pm–5:00 pm

Ballroom A

Cloud storage has become an important part of both the way that modern compute clouds are built and the service that they provide for users. This tutorial will explain what cloud storage systems have in common and what makes each one different, enabling attendees to select or build the right system for their specific needs.

Who should attend:

Primarily, people who wish to implement their own task-specific cloud storage systems. Secondarily, those who wish to understand the tradeoffs implicit in existing cloud storage systems.

Topics include:

Types of cloud storage: service for a cloud provider, service for cloud users, or service for consumers
Tradeoffs between consistency, performance, and availability
Special requirements: security and privacy, legal and regulatory compliance
Common techniques: membership and leader election, consistent hashing, vector clocks, Merkle trees, Bloom filters
Case studies: existing systems representing different tradeoffs and techniques

Additional Materials:

Back to Top

Brent Welch, Google

Brent Welch is a senior software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

Read more about Brent Welch, Google

M4

Clustered and Parallel Storage System Technologies

Brent Welch, Google

Brent Welch is a senior software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

1:30 pm–5:00 pm

Ballroom B

This tutorial is oriented toward administrators and developers who manage and use HPC systems, and especially for those involved with storage systems in these environments. Storage is often a critical part of the HPC infrastructure. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices and SSDs, in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, and object-based parallel file system architectures.

Topics include scaling the data path, scaling metadata, fault tolerance, and manageability. Specific systems are discussed, including Lustre, GPFS, PanFS, HDFS (Hadoop File System), OpenStack, and the NFSv4.1 standard for parallel I/O. We continue up the stack to discuss MPI-IO middleware that is often used in large parallel programming environments for efficient I/O at scale.

Topics include:

Scaling the data path
Scaling metadata
Fault tolerance
Manageability

Specific systems are discussed, including Lustre, GPFS, PanFS, HDFS (Hadoop File System), OpenStack, and the NFSv4.1 standard for parallel I/O. We continue up the stack to discuss MPI-IO middleware that is often used in large parallel programming environments for efficient I/O at scale.

Additional Materials:

Back to Top

Tuesday, June 17, 2014

Half Day Morning

Daniel A. Menascé, George Mason University

Daniel Menascé is the author of over 225 papers and five books published by Prentice Hall and translated into Russian, Korean, and Portuguese. He has received two Outstanding Teaching awards from George Mason University and was a finalist in the 2014 Outstanding Faculty competition among all public and private higher education institutions in the state of Virginia. Menascé has given keynote addresses and presented tutorials at various conferences.

Read more about Daniel A. Menascé, George Mason University

T1

Autonomic Computing and Its Applications

Daniel A. Menascé, George Mason University

Daniel Menascé is the author of over 225 papers and five books published by Prentice Hall and translated into Russian, Korean, and Portuguese. He has received two Outstanding Teaching awards from George Mason University and was a finalist in the 2014 Outstanding Faculty competition among all public and private higher education institutions in the state of Virginia. Menascé has given keynote addresses and presented tutorials at various conferences.

9:00 am–12:30 pm

Washington AB

This tutorial provides an overview of AC and the various technologies that have been used to design and implement AC systems. Examples will be given in a variety of areas. The tutorial follows this outline:

AC Overview (15 min)
Techniques used: model-driven, learning-based, control-theory (45 min)
Applications of AC (1 hour and 45 minutes):
- Cloud computing and data centers
- Adaptive software systems
- E-commerce and Web systems
- SOA systems
- Databases
- Emergency departments
Concluding Remarks (15 min)

Who should attend:

Researchers and Autonomic Computing (AC) practitioners. No prerequisites required.

Additional Materials:

Back to Top

Half Day Afternoon

Iqbal I. Mohomed, IBM T.J. Watson Research Center

Dr. Iqbal Mohomed is a Research Staff Member at IBM's T.J. Watson Research Center in NY. His research interests are in distributed systems, cloud and mobile computing. Most recently, he is working on workload orchestration in cloud environments.

Dr. Mohomed earned his Ph.D. from the University of Toronto in 2008. His dissertation work was on automatic customization of web content for mobile devices. As a postdoc at Microsoft Research Silicon Valley, he worked on several middleware systems for mobile devices including the StarTrack middleware for efficient organization of user location data and the Contrail system for enabling secure communication across mobile users (awarded Best Paper at Middleware 2011). At IBM Research, Dr. Mohomed has worked on various projects including the use of personal mobile devices to enable efficient long-term health monitoring (HARMONI), a cloud-based monitoring system for virtual machines (Cloudscope) and a distributed system for optimizing placement of resources in a private cloud computing system.

Read more about Iqbal I. Mohomed, IBM T.J. Watson Research Center

Asser N. Tantawi, IBM T.J. Watson Research Center

Dr. Asser N. Tantawi is a Research Staff Member at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY. He received his Ph.D. degree in computer science from Rutgers University in 1982. Dr. Tantawi has published numerous articles in scientific journals and international conferences. His fields of interest include cloud optimization, workload management, analytics, systems modeling, stochastic analysis, model-based control, load balancing, resource optimization, queueing models. He is a senior member of IEEE and a member of ACM and IFIP WG 7.3 (Computer System Modeling). He has also served as an ACM national lecturer.

Read more about Asser N. Tantawi, IBM T.J. Watson Research Center

T2

Autonomic Cloud Workload Optimization: Placement in OpenStack

Iqbal I. Mohomed, IBM T.J. Watson Research Center

Dr. Iqbal Mohomed is a Research Staff Member at IBM's T.J. Watson Research Center in NY. His research interests are in distributed systems, cloud and mobile computing. Most recently, he is working on workload orchestration in cloud environments.

Dr. Mohomed earned his Ph.D. from the University of Toronto in 2008. His dissertation work was on automatic customization of web content for mobile devices. As a postdoc at Microsoft Research Silicon Valley, he worked on several middleware systems for mobile devices including the StarTrack middleware for efficient organization of user location data and the Contrail system for enabling secure communication across mobile users (awarded Best Paper at Middleware 2011). At IBM Research, Dr. Mohomed has worked on various projects including the use of personal mobile devices to enable efficient long-term health monitoring (HARMONI), a cloud-based monitoring system for virtual machines (Cloudscope) and a distributed system for optimizing placement of resources in a private cloud computing system.

Asser N. Tantawi, IBM T.J. Watson Research Center

Dr. Asser N. Tantawi is a Research Staff Member at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY. He received his Ph.D. degree in computer science from Rutgers University in 1982. Dr. Tantawi has published numerous articles in scientific journals and international conferences. His fields of interest include cloud optimization, workload management, analytics, systems modeling, stochastic analysis, model-based control, load balancing, resource optimization, queueing models. He is a senior member of IEEE and a member of ACM and IFIP WG 7.3 (Computer System Modeling). He has also served as an ACM national lecturer.

1:30 pm–5:00 pm

Washington AB

This tutorial opens the door for the ICAC audience to apply some of the autonomic computing ideas to the optimized deployment of workloads in the cloud. We have designed the tutorial to have two parts: (I) Overview of cloud management, OpenStack, Heat, and HOT technologies; and (II) Optimization algorithms for solving the large-scale placement problem of workloads in the cloud, in a scaleable manner. Part I acts as an introduction to the area for those who may be experts in autonomic computing, but are not quite familiar with the state-of-the-art of cloud management. And, part II should appeal to the theoreticians and application-oriented in the audience alike.

Overview of cloud management (1.5 hrs):
- Overview of OpenStack open source cloud software
- Heat template-driven orchestration engine
- HOT: The Heat orchestration template
- Cloud workload definition
- Architecture of a workload placement engine
- End-to-end flow
Workload Optimization (1.5 hrs)

Definition of workload placement optimization problem
Problem complexity and scalability
Algorithmic approaches to placement optimization
Examples and case studies

Additional Materials:

Back to Top

Thursday, June 19, 2014

Half Day Morning

David Nalley, Citrix Systems

David Nalley is a recovering systems administrator of 10 years and acted as a consultant for a an additional 3 years. David is a committer on the Apache CloudStack project. Within the Fedora Project, David is the maintainer of a few dozen software packages, and is a sponsor for new packagers in Fedora, and currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and is frequently found speaking at IT and F/LOSS conferences.

Read more about David Nalley, Citrix Systems

R2

Using Apache CloudStack to Build a Big IaaS Cloud

David Nalley, Citrix Systems

David Nalley is a recovering systems administrator of 10 years and acted as a consultant for a an additional 3 years. David is a committer on the Apache CloudStack project. Within the Fedora Project, David is the maintainer of a few dozen software packages, and is a sponsor for new packagers in Fedora, and currently serving on the Fedora Project Board. David is a frequent author for development, sysadmin, and Linux magazines and is frequently found speaking at IT and F/LOSS conferences.

9:00 am–12:30 pm

Riverview A

IaaS provides infrastructure as a service, allowing your customers to spin up VMs, services, and even machine images on demand. In this class, you will learn about building IaaS clouds using Apache CloudStack. We'll cover everything from initial configuration to scaling and maintaining availability. Apache CloudStack works with KVM, vSphere, and XenServer/XCP for virtualization, and also supports Amazon Web Services.

Who should attend:

Intermediate to advanced sysadmins or enterprise architects wanting to deploy a production Infrastructure as a Service cloud.

Prerequisites: Experience with virtualization platforms and deep understanding of layer two and three (L2/L3) networking is preferred but not required.

Take back to work:

What you need to deploy an IaaS cloud based on Apache CloudStack in an enterprise environment.

Topics include:

Deploying Apache CloudStack
Next-generation network topologies, including SDN
Scaling storage without becoming indentured to SAN vendors
Making CloudStack scale to tens of thousands of physical nodes
Maintaining availability in a "failure-is-assured" environment

Additional Materials:

Back to Top

Half Day Afternoon

Joshua Jensen, EMC

Joshua Jensen has been working with Linux for 18 years and is currently a Linux/Virtualization/Cloud Architect for EMC. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

Read more about Joshua Jensen, EMC

R3

Jenkins for Continuous Integration

Joshua Jensen, EMC

Joshua Jensen has been working with Linux for 18 years and is currently a Linux/Virtualization/Cloud Architect for EMC. He has worked in the past as the Lead Linux IT Architect for Cisco Systems, and was Red Hat's first instructor, examiner, and RHCE. At Red Hat, he wrote and maintained large parts of the Red Hat curriculum: Networking Services and Security, System Administration, Apache and Secure Web Server Administration, and the Red Hat Certified Engineer course and exam.

1:30 pm–5:00 pm

Riverview A

Your software development project needs a butler. Why? Your development team may be guilty of writing great code while only sporadically testing it, and pushing your code-base to production is tedious at best. Enter Continuous Integration with Jenkins. Learn how this development methodology and open source project can improve your software development process by consistently and continuously automating your software projects’ integration, testing, and deployment. I will walk you through a scenario of setting up a project using Jenkins. This course will conclude with a hands-on lab incorporating a GitHub project into an automated Jenkins Build Pipeline.

Who should attend:

DevOps administrators in need of sanity when testing and deploying software

Lab requirements: Each participant in this class will need a laptop with two virtual machines preinstalled with any Jenkins-compatible Linux distribution.

Take back to work:

Real world usage of Jenkins for better automation of testing and deployment with your team's software project.

Topics include:

Continuous integration concepts and best practices
Jenkins concepts and configuration
Jobs, job history, and build artifacts
SCM integration and job triggering
Jenkins cluster farming: multi-node management made easy
Authentication with Active Directory, LDAP, and role-based authorization
Jenkins plugins and third-party integration
Distributed builds for heterogeneous architectures
Build/test/deploy pipelines
Hands-on lab incorporating a GitHub project into an automated Build Pipeline

Additional Materials:

Back to Top

Friday, June 20, 2014

Full Day

Salim Virji, Google

Salim Virji has been developing software and administering systems for 20 years. After working as an applications software engineer at several startups, he found his interest drawn to distributed file systems and cluster computing while pursuing a hobby project with friends.

At Google, he has worked on developing and deploying the company's core software building blocks. Since 2003, he has held the the position of Site Reliability Engineer, responsible for technical infrastructure such as Bigtable, Chubby, and GFS; more recently, he has joined the team developing software infrastructure for Google Apps such as Docs and Presentations. He also collaborates with Google's in-house training programs for engineers, developing curriculum and presenting hands-on workshops to introduce engineers to Google technology and techniques.

He received an A.B. in Classics from the University of Chicago.

Read more about Salim Virji, Google

F1

SRE University—Practical Large System Design

Salim Virji, Google

Salim Virji has been developing software and administering systems for 20 years. After working as an applications software engineer at several startups, he found his interest drawn to distributed file systems and cluster computing while pursuing a hobby project with friends.

At Google, he has worked on developing and deploying the company's core software building blocks. Since 2003, he has held the the position of Site Reliability Engineer, responsible for technical infrastructure such as Bigtable, Chubby, and GFS; more recently, he has joined the team developing software infrastructure for Google Apps such as Docs and Presentations. He also collaborates with Google's in-house training programs for engineers, developing curriculum and presenting hands-on workshops to introduce engineers to Google technology and techniques.

He received an A.B. in Classics from the University of Chicago.

9:00 am–5:00 pm

Riverview A

In this class, you will learn about large system design. Truly large-scale systems are still rare, and in a world of outsourcing and cloud computing, it's harder for system administrators to get the opportunity to design large systems. It's even harder to get the design right. Most organizations don't have the in-house expertise to build a large system, and outsource the detailed design to external contractors. If your organization doesn't have the expertise to design a large system, it's unlikely that it has the expertise to confirm that a proposal is fit for purpose and cost effective.

While anyone can wave their hands convincingly and come up with a rough outline of a large distributed system, those who can also fill in the details are highly prized. This class will teach you how to design software systems like Imgur and Twitter, then estimate the hardware needed to ensure you can deliver to an SLA. You will learn how requirements like queries-per-second, multi-site reliability, and data security impact the cost of implementation. You will be involved in classroom exercises in small groups, each with its own Google SRE mentor, while working out these designs.

Who should attend:

System administrators, SREs, and DevOps who have some familiarity of distributed systems, server hardware, and systems programming, especially those who would like to work with, procure, or build large distributed systems.

Take back to work:

How to design large distributed systems, how to evaluate design proposals, and how to explain such designs to third parties.

Topics include:

Design patterns for large distributed systems
Monitoring large-scale systems
Large-scale design workshop and presentations
Non-abstract design; taking a design and producing a "bill of materials"
Designing to fail; how to work around rack, networking, and datacenter failures

Additional Materials:

Back to Top

Sunday, November 9, 2014

Full Day

Chris Jones, Google, Inc.

A computational daemonologist, Chris works in San Francisco as a SiteReliability Engineer for Google App Engine, a platform serving over 28 billion requests per day. He was previously responsible for the care and feeding of advertising statistics, data warehousing, and customer support systems, joining Google in 2007. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in computer engineering, economics, and technology policy along the way.

Read more about Chris Jones, Google, Inc.

S1

SRE Classroom: Practical Large Scale Design

Chris Jones, Google, Inc.

A computational daemonologist, Chris works in San Francisco as a SiteReliability Engineer for Google App Engine, a platform serving over 28 billion requests per day. He was previously responsible for the care and feeding of advertising statistics, data warehousing, and customer support systems, joining Google in 2007. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in computer engineering, economics, and technology policy along the way.

9:00 am–5:00 pm

Grand Ballroom C

Truly large-scale systems are still rare, and in a world of out-sourcing, and cloud-computing, it's harder for System Administrators to get the opportunity to design large systems. It's even harder to get the design right. Most organizations don't have the in-house expertise to build a large system, so outsource the detailed design to external contractors. If your organization doesn't have the expertise to design a large system, it's unlikely that it has the expertise to confirm a proposal is fit for your purposes and is cost effective.

While anyone can wave their hands convincingly, and come up with a rough outline of a large distributed system, those who can also fill in the detail are highly-prized. This class will teach you how to design software systems like Imgur and Twitter, then estimate the hardware needed to ensure you can deliver to a hard SLA. You will learn how requirements like queries-per-second, multi-site reliability, and data security impact the cost of implementation.

This will *not* cover concurrent programming and the software-engineering side of distributed systems.

Who should attend:

System Admins who want to understand large systems, System Engineers who want to build large systems and business people who expect to procure large systems.

Take back to work:

An ability to assess whether a design will work before it's built. The ability to communicate large system design with systems architects and business people. An ability to assess requirements, communicate constraints and predict performance of large systems before money is spent.

Topics include:

How design works
Finding bottlenecks in systems
Large scale design patterns
Lab work: gathering requirements in teams
Lab work: estimating hardware requirements
Lab work: building in reliability

Additional Materials:

Back to Top

Jennifer Davis, Chef

Jennifer has years of experience improving platform development efficiency through building reliable large scale services, reducing complexity of product implementation, and automation of infrastructure. As a Chef Automation Consulting Engineer, she helps companies discover their own best practices. Prior to joining Chef, she was a lead service engineer at Yahoo! within cloud platform services supporting the multi-tenant Hadoop environment and Sherpa. She is also a sparkly devops princess that is passionate about increasing diversity in tech. She doesn’t play favorites for programming languages or editors, but her favorite board game is Waterdeep.

Read more about Jennifer Davis, Chef

S3

Hadoop Operations

Jennifer Davis, Chef

Jennifer has years of experience improving platform development efficiency through building reliable large scale services, reducing complexity of product implementation, and automation of infrastructure. As a Chef Automation Consulting Engineer, she helps companies discover their own best practices. Prior to joining Chef, she was a lead service engineer at Yahoo! within cloud platform services supporting the multi-tenant Hadoop environment and Sherpa. She is also a sparkly devops princess that is passionate about increasing diversity in tech. She doesn’t play favorites for programming languages or editors, but her favorite board game is Waterdeep.

9:00 am–5:00 pm

Grand Ballroom B

If you are interested in running Hadoop and need to understand how to manage a cluster and not how to write Java applications, this course is for you. If you inherited a non-production environment that now has mission critical data and you need to harden that Hadoop environment for production, then this course is also for you. In this class, I will teach you the core set of skills needed to deploy, configure, and monitor resources with an emphasis on providing you with the basics so you can use self-directed learning to go deeper. While there will be some exercises to show you how to use Hadoop, the focus of this tutorial and its hands-on exercises is operation.

Who should attend:

Unix system administrators interested in an introduction to hadoop operations.

Take back to work:

Techniques for deploying, configuring, monitoring, and securing a hadoop cluster
Essential Hadoop troubleshooting skills
Working Hadoop environment on laptop

Topics include:

Overview of Hadoop and HDFS
Sandbox setup for class exercises
Architecture
Installation
Configuration
Logging
Monitoring
Security

Additional Materials:

Back to Top

Dr. Neil J. Gunther, Performance Dynamics Company

Neil Gunther, M.Sc., Ph.D. is a researcher specializing in performance and capacity management. Prior to starting his own consulting company in 1994 (www.perfdynamics.com), Neil worked on the NASA Voyager and Galileo missions, the Xerox PARC Dragon multiprocessor, and the Pyramid/Siemens RM1000 parallel cluster. Neil has authored many technical articles and several books including: Guerrilla Capacity Planning (Springer 2007) and the 2nd edition of Analyzing Computer System Performance with Perl::PDQ (Springer 2011) and received the A.A. Michelson Award in 2008.

Read more about Dr. Neil J. Gunther, Performance Dynamics Company

S4

Tactical Capacity Planning for Sysadmins

Dr. Neil J. Gunther, Performance Dynamics Company

Neil Gunther, M.Sc., Ph.D. is a researcher specializing in performance and capacity management. Prior to starting his own consulting company in 1994 (www.perfdynamics.com), Neil worked on the NASA Voyager and Galileo missions, the Xerox PARC Dragon multiprocessor, and the Pyramid/Siemens RM1000 parallel cluster. Neil has authored many technical articles and several books including: Guerrilla Capacity Planning (Springer 2007) and the 2nd edition of Analyzing Computer System Performance with Perl::PDQ (Springer 2011) and received the A.A. Michelson Award in 2008.

9:00 am–5:00 pm

Grand Ballroom D

Most system administrators are already familiar with diagnosing and tuning computer systems using performance data captured by open source or commercial monitoring tools such as Splunk, Graphite, and BMC Patrol.

This full-day tutorial will show you how to get beyond performance monitoring to performance analysis and capacity planning. These skills are in demand more than ever for sizing (over-engineering can't improve single-threaded performance), procurement (try buying a 10GHz processor), as well as ensuring scalability of large infrastructures used in both private and public clouds.

Since computer hardware has become a mass-produced commodity, its cost no longer drives capacity planning in the strategic sense. The capacity part has become cheap and easy; it's the planning part that requires skill. And capacity planning is not just about the future anymore. Rather, it needs to respond rapidly to the fast-paced changes and tighter budgets of modern business environments. Enter tactical planning: Guerrilla-style capacity planning.

Who should attend:

Anyone looking for job security by improving their skill set to include capacity management. No specialized background in performance analysis or capacity planning is assumed. A working knowledge of Linux or Unix performance tools will be helpful.

Take back to work:

The ability to start analyzing performance data you may already have collected to forecast system capacity and predict bottlenecks that can hinder system scalability.

Topics include:

What is performance and capacity management?
The Guerrilla approach to capacity planning.
Monitoring the volatile technology marketplace for procurement.
The three performance metrics you need to know.
Who ordered multicores and what are their performance limitations?
Statistical forecasting with R.
How to establish a capacity line.
Queueing analysis for those who can’t wait.
How to use PDQ for bottleneck analysis in R, C, Perl, and Python.
Quantifying scalability using Amdahl's law and the Universal Scalability Law.
Virtualization capacity management from core hyperthreads to cloud hyperservices.
Scalability analysis of Xen, VMware and WebLogic virtualized servers.
Case studies in capacity planning for large-scale web sites and multi-tier applications.

Additional Materials:

Back to Top

Half Day Morning

Kyrre Begnum, Oslo and Akershus University College of Applied Sciences

Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BS levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

Read more about Kyrre Begnum, Oslo and Akershus University College of Applied Sciences

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

Read more about Nicole Forsgren Velasquez, Utah State University

S2

Statistics for Ops: Making Sense Out of Data

Kyrre Begnum, Oslo and Akershus University College of Applied Sciences

Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BS levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

9:00 am–12:30 pm

Grand Ballroom A

This tutorial is a course in statistics with a specific focus on system administrators and the types of data they face. We assume little prior knowledge of statistics and cover the most common concepts in descriptive statistics and apply them to data taken from real-life examples. Our aim is to provide insight into what methods provide good interpretation of data such as distributions, probability and formulating basic statements about the properties of observed data.

The tutorial instructors will be available in a Lab Space following the tutorial in order to answer questions and offer personal feedback on cases the attendees wish to investigate with their own data.

Who should attend:

Sysadmins who are faced with data overload and wish they had some knowledge of how statistics can be used to make more sense of it. We assume little prior knowledge of statistics, but a basic mathematical proficiency is recommended.

Take back to work:

A fundamental understanding of how descriptive statistics can help provide additional insight on the data in the sysadmin world and that will allow for further self-study on statistics.
A basic set of statistical approaches that can be used to identify fundamental properties of the data they see in their own environments, and identify patterns in that data.
Learn how to make accurate and clear statements about metrics that are valuable to their organization.

Topics include:

Descriptive statistics for single datasets, including: mean, median, mode, range, and distributions
Basic analysis of distributions and probabilities using percentiles typically seen in ops
Interpretation of analyses to include team and business implications
Regression analysis to suggest predictive relationships, with an emphasis on interpretation and implications
Correlation analysis and broad pattern detection (if time allows)

Additional Materials:

Back to Top

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

Read more about Thomas A. Limoncelli, Stack Exchange, Inc.

S5

Work Like a Team: Best Practices for Team Coordination and Collaborations So You Aren't Acting Like a Group of Individuals

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

9:00 am–12:30 pm

Cedar Room AB

System Administration is a team sport. How can we better collaborate and work as a team? Techniques will include many uses of Google Docs, wikis and other shared document systems, as well as strategies and methods that create a culture of cooperation.

Who should attend:

System administrators and managers that work on a team of 3 or more.

Take back to work:

Understanding the different roles people play within a team.
Behavior that builds team cohesion
3 uses of Google docs you had not previously considered
How to organize team projects to improve teamwork
Track projects using Kanban boards.
How to divide big projects among team members
Collaborating via the "Tom Sawyer Fence Painting" technique
How to criticize the work of teammates constructively
How to get agreement on big plans

Topics include:

Meetings: How to make them more effective, shorter, and more democratic
How to create accountability, stop re-visiting past decisions, improve involvement
Strategy for leaving “fire-fighting” mode, be more “project focused”.
Project Work: Using “design docs” to get consensus on big and small designs before they are committed to code.
Service Docs: How to document services so any team member can cover for any other.
Kanban: How to manage work that needs to be done.
Chatroom effectiveness: How to make everyone feel included, not lose important decisions.
Playbooks: How to get consistent results across the team, train new-hires, make delegation easier.
Send more effective email: How to write email that gets read.

(NOTE: This class is a reboot of last year’s “Advanced Time Management: Team Efficiency”)

Additional Materials:

Back to Top

Half Day Afternoon

Jason Maughan, PurePredictive, Inc.

Jason Maughan is a predictive data analyst at PurePredictive, Inc., where his primary role is machine learning ensemble construction and optimization. He enjoys data exploration and visualization using tools such as R. His mission is to find hidden patterns in data to explain the past and predict the future. Jason holds a Masters in Management Information Systems from Utah State University.

Read more about Jason Maughan, PurePredictive, Inc.

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

Read more about Nicole Forsgren Velasquez, Utah State University

S6

R for Sysadmins

Jason Maughan, PurePredictive, Inc.

Jason Maughan is a predictive data analyst at PurePredictive, Inc., where his primary role is machine learning ensemble construction and optimization. He enjoys data exploration and visualization using tools such as R. His mission is to find hidden patterns in data to explain the past and predict the future. Jason holds a Masters in Management Information Systems from Utah State University.

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

1:30 pm–5:00 pm

Grand Ballroom A

We provide a brief introduction to the R programming and statistics language, with a focus on exploratory data analysis for sysadmins. We assume little prior knowledge of statistics and no prior knowledge of the R language or programming environment. The course includes data input, basic manipulation, visualization and plotting, and basic analysis in R. The instructors will be available in the LISA Labs space after the tutorial for attendees that have additional questions or have brought their own data. R (http://www.r-project.org/) and RStudio (http://www.rstudio.com/), an IDE for R, should be installed prior to attending.

Who should attend:

Sysadmins who would like an introduction to R as a tool for gaining additional insight into their wealth of data using R’s statistical and visualization capabilities. We assume little prior knowledge of R or statistics, but basic mathematical proficiency is recommended.

Take back to work:

Experience with basic methods and data types in R
Experience with basic visualizations in R
Basic understanding of data exploration, and analysis to identify patterns in R, such as correlations, regressions, and decision trees (as time allows)

Topics include:

An introduction to R and the RStudio programming environment
Basic instructions for loading, manipulating, and saving data
Basic functions and algorithms for exploring data, and the types of analysis useful for sysadmins
An introduction to descriptive statistics for single datasets, including: mean, median, mode, range, and distributions
Basic visualizations in R, including histograms, scatterplots, and heatmaps (as time allows)

Additional Materials:

Back to Top

Jean-Charles Lopez, Inktank

Jean-Charles has been an instructor in IT storage environments for the last 15 years. He has been working with equipment from different vendors. He joined Inktank (since then acquired and now a division of Red Hat) in December 2013 as Senior Technical Instructor and head of Inktank University to help the adoption of Ceph in general and Red Hat Inktank Ceph Enterprise (Red Hat ICE) offering in particular.

Read more about Jean-Charles Lopez, Inktank

S7

Introduction to Ceph

Jean-Charles Lopez, Inktank

Jean-Charles has been an instructor in IT storage environments for the last 15 years. He has been working with equipment from different vendors. He joined Inktank (since then acquired and now a division of Red Hat) in December 2013 as Senior Technical Instructor and head of Inktank University to help the adoption of Ceph in general and Red Hat Inktank Ceph Enterprise (Red Hat ICE) offering in particular.

1:30 pm–5:00 pm

Cedar Room AB

Learn about the history of Ceph and get to know how Ceph, as Software Defined Storage, uses its different components to offer Object access (S3 and Swift compatible), Block Level access and File Level access. I combine lecture with instructor-led demonstrations using Ceph.

Who should attend:

Any system or storage administrator who wants to get a better and more in depth understanding of Ceph. Any CIO or CTO who would like to know how Ceph could benefit his infrastructures and help contain costs.

Take back to work:

Virtual Machine images for playing with Ceph
Lab instructions on how to setup and play with a test cluster

Topics include:

Overview of the Ceph Architecture
Different access methods to the Ceph Storage Cluster
Hands-on using Ceph

Additional Materials:

Back to Top

Monday, November 10, 2014

Full Day

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

Read more about Theodore Ts'o, Google

M1

Linux Performance Tuning

Theodore Ts'o, Google

Theodore Ts'o has been a Linux kernel developer since almost the very beginnings of Linux: he implemented POSIX job control in the 0.10 Linux kernel. He is the maintainer and author of the Linux COM serial port driver and the Comtrol Rocketport driver, and he architected and implemented Linux's tty layer. Outside of the kernel, he is the maintainer of the e2fsck filesystem consistency checker. Ted is currently employed by Google.

9:00 am–5:00 pm

Grand Ballroom C

The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done ""out of the box"" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workload on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:

Strategies for performance tuning
Characterizing your workload's requirements
Finding bottlenecks
Tools for measuring system performance
Memory usage tuning
Filesystem and storage tuning
NFS performance tuning
Network tuning
Latency vs. throughput
Capacity planning
Profiling
Memory cache and TLB tuning
Application tuning strategies

Additional Materials:

Back to Top

Nathen Harvey, Chef

Nathen Harvey is the Director of Community at Chef. Before joining Chef, Nathen spent a number of years managing operations and infrastructure for a number of web applications. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps. He participates in many different meetup groups and organizes the DevOpsDC Meetup Group. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and foursquare.

Read more about Nathen Harvey, Chef

M3

Testing Your Automation Code

Nathen Harvey, Chef

Nathen Harvey is the Director of Community at Chef. Before joining Chef, Nathen spent a number of years managing operations and infrastructure for a number of web applications. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps. He participates in many different meetup groups and organizes the DevOpsDC Meetup Group. When not working or hosting meetups, Nathen enjoys going to concerts, drinking craft beer, and over-sharing on sites like Twitter, untappd, and foursquare.

9:00 am–5:00 pm

Cedar Room AB

This is a hands-on tutorial that will cover the basics that everyone needs to know about how to test your automation code. We’ll start off with a quick introduction to Chef and work our way through writing a fully-tested cookbook or two using linting, unit testing, integration testing, and cross-platform testing.

Hands-on exercises throughout the tutorial will reinforce the material discussed.

Who should attend:

Anyone responsible for managing infrastructure especially those who are interested in automating the provisioning and management of said infrastructure using state-of-the-art tools and practices.

Take back to work:

A working code base that includes samples for building out testable infrastructure components.

Topics include:

Introduction to Chef
Test-driven Development (TDD)
syntax check - knife cookbook test
linting - foodcritic
unit testing - ChefSpec
integration testing - ServerSpec

Additional Materials:

Back to Top

Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

Read more about Branson Matheson, Blackphone

M4

Hands on Security for System Administrators

Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

9:00 am–5:00 pm

Grand Ballroom B

System administrators often run into interesting conflicts between sysadmin, customer, and IT security needs. These conflicts generally lead to difficulty in achieving a balance between administrative convenience, good customer service, and minimal risk. There are processes or services available that can significantly improve any of these areas; however, many times they are costly or resource intensive. This course is designed for system administrators who want to improve the security posture of their organizations, using IT security standards paired with good system administration practices, resulting in better service, lower risk, and minimal impact to time and budget.

We will walk a path, covering many domains of IT security and enterprise system administration in ways that utilize the interconnection between good system administration practices and security fundamentals. We will discuss recent risks and threats, analyze them with respect to your environment, review possible impacts and develop mitigations that you can apply immediately. Training includes instruction, discussion, many hands-on labs, and a strong dose of common sense.

Attendees should bring a laptop capable of running a Virtual Guest and will be provided a VM in which to work. The class will have access to a test network of systems for training and lab exercises.

Who should attend:

Beginning to mid-level system administrators of any stripe with an interest in IT security and a desire to improve their security. It is suggested that participants have experience with the *nix command line.

Take back to work:

A toolbox of documentation, (mostly) free software, and a good starting point for developing better practices to improve security and system administration.

Topics include:

The relationship between system administration and IT security
Security theories, standards and risk mitigation as applied by SA's
Information management using Trac and Subversion
Good system administration practices that directly improve IT security
Basic configuration driven system management using Puppet
Host and network auditing, hardening, and monitoring
Developing an effective security awareness program

Additional Materials:

Back to Top

Half Day Morning

Nick Chase, Mirantis

Nick Chase has 20+ years of experience as a developer and author. He has written several books and hundreds of articles as an IBM developerWorks Certified Master Author, founded NoTooMi.com, and has done web application development for companies such as Alcatel-Lucent, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. He is currently Mirantis' Technology Marketing Manager and the Editor of OpenStack:Now.

Read more about Nick Chase, Mirantis

M2

Understanding OpenStack

Nick Chase, Mirantis

Nick Chase has 20+ years of experience as a developer and author. He has written several books and hundreds of articles as an IBM developerWorks Certified Master Author, founded NoTooMi.com, and has done web application development for companies such as Alcatel-Lucent, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. He is currently Mirantis' Technology Marketing Manager and the Editor of OpenStack:Now.

9:00 am–12:30 pm

Grand Ballroom D

This session is designed to give you an understanding of the architecture of OpenStack, how the pieces fit together, and how you can make use of it in your environment. The session provides an understanding of the OpenStack architecture and how to use it. It also includes a hands-on lab during which students will be able to provision and use resources from an existing OpenStack cluster. Students need to bring an internet-capable laptop with a modern browser.

Who should attend:

This tutorial will be best suited for those who are either curious about or considering OpenStack, but don't have a deep understanding of how it works. This can include users from companies who are considering private or public cloud, or who are already virtualized in some capacity, such as using vCenter, or who are simply interested in increasing capacity and speed-to-market.

Take back to work:

Attendees will go back to work with an understanding of how OpenStack fits into a business environment, and a good grounding in how it works and how it can be applied in their own situations.

Topics include:

OpenStack architecture and components
How requests flow through an OpenStack cluster
Virtual machine provisioning and workload scheduling
Networking architecture
Storage architecture

Additional Materials:

Back to Top

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

Read more about Nicole Forsgren Velasquez, Utah State University

M5

Winning the Budget Game: How to Get the Money You Need for IT Every Time

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

9:00 am–12:30 pm

Grand Ballroom A

Getting the resources your team needs is a matter of knowing just enough about finances to communicate what you want, and explaining how it will benefit the company. The trick is to speak their language. Learn the basics of budgeting, benchmarking, resource allocation, cost-benefit analysis, and communicating costs using tools like net present value. Understand the importance of your business cycle and the difference between cap-ex and op-ex. The course will also cover spreadsheet magic, like pivot tables and pivot charts.

Who should attend:

Sysadmins who want to save their IT budgets, and need to learn how to speak “MBA”. This tutorial will help you do just that, arming you with the right approach and the right tools to go to management and communicate your needs and the alternatives available, so that you’re a part of the budget conversation instead of just the recipient of budget decisions.

Take back to work:

Understand differences between cap-ex and op-ex, and leverage differences in business cycles and settings.
Write a basic capital proposal, including relevant value analysis
Draft a budget, complete with impacts.
Compare alternatives and communicate relevant costs.
Fundamental understanding of pivot tables and pivot charts.

Topics include:

Capital expenses vs. operational expenses: knowing the difference, and knowing why the difference matters
Important business cycles and dates
Comparing alternatives and communicating relevant costs
Cost-benefit analysis
Budgeting, including drafting a budget, burn rates, and standard costs and variances
Capital planning, including net present value, PV index, internal rate of return, and payback period
Pivot tables and pivot charts

Additional Materials:

Back to Top

Half Day Afternoon

John Sellens, SYONEX

John Sellens has been involved in system and network administration for over 25 years, and has been teaching and writing on related topics for many years. He holds an M.Math. in computer science from the University of Waterloo. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the operations team at FreshBooks.

Read more about John Sellens, SYONEX

M6

Elasticsearch, Logstash, and Other Data

John Sellens, SYONEX

John Sellens has been involved in system and network administration for over 25 years, and has been teaching and writing on related topics for many years. He holds an M.Math. in computer science from the University of Waterloo. He is the proprietor of SYONEX, a systems and networks consultancy, and is currently a member of the operations team at FreshBooks.

1:30 pm–5:00 pm

Grand Ballroom D

Elasticsearch is a distrbuted and reliable data store that can be used for a variety of purposes. One use of particular interest to system administrators is as a storage engine for Logstash. This tutorial covers how to implement an Elasticsearch cluster and use Logstash and related tools to store and query log data (syslog, web logs, etc).

Who should attend:

System administrators who need a tool to aggregate and examine log data across their environment.

Take back to work:

Participants will leave the tutorial ready to implement and manage an Elasticsearch cluster, store and analyze their logs and other data with Logstash, and methods for using Elasticsearch with other data.

Topics include:

An introduction to Elasticsearch
How to implement and manage a replicated and distributed data store.
How to use Logstash to store log files (or other time-stamped data)
Tools for querying and analyzing those logs
Ways to use and abuse Elasticsearch for other types of data

Additional Materials:

Back to Top

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

Read more about Thomas A. Limoncelli, Stack Exchange, Inc.

M7

Evil Genius 101: Subversive Ways to Promote DevOps and Other Big Changes

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

1:30 pm–5:00 pm

Grand Ballroom A

You want to innovate: deploy new technologies such as configuration management, kanban, a wiki, or standardized configurations. Your coworkers don’t want change: they like the way things are. Therefore, they consider you evil. However you aren’t evil, you just want to make things better. Learn how to talk your team, managers and executives into adopting DevOps techniques and culture.

Who should attend:

Sysadmins and managers looking to influence the technology and culture of your organization.

Take back to work:

Help your coworkers understand and agree with your awesome ideas
Convince your manager about anything. Really.
Get others to trust you so they are more easily convinced
Deciding which projects to do when you have more projects than time
Turn the most stubborn user into your biggest fan
Make decisions based on data and evidence

Topics include:

DevOps "value mapping" exercise: Understand how your work relates to business needs.
So much to do! What should you do first?
How to sell ideas to executives, management, co-workers, and users.
Simple ways to display data to get your point across better.

Additional Materials:

Back to Top

Tuesday, November 11, 2014

Full Day

Stuart Kendrick, EMC Isilon

Stuart Kendrick works as a Sustaining Engineer for EMC Isilon. He has worked in software development, help desk, desktop support, system administration, and network support. He spent two decades at the Fred Hutchinson Cancer Research Center in a multi-disciplinary role covering transport, network management, root cause analysis, and Problem Management. He is happiest when correlating packet traces with syslog extracts, writing scripts to query device MIBs, and facilitating hands-on classes at LISA, Cascadia, and Sharkfest.

Read more about Stuart Kendrick, EMC Isilon

Chris Shaiman, Watchguard Technologies

Chris Shaiman is the Lead Network Engineer and Security Analyst for WatchGuard Technologies. Previously, he worked at Cingular Wireless and Xerox. Specializing in Networking and Security, he recently acquired his Certified Information Systems Security Professional certificate and has been enjoying building a more robust security program at his current employer. When not focusing on his love for networking and security related sciences, he is working on completing his hot yoga teacher training, and bioponic research.

Read more about Chris Shaiman, Watchguard Technologies

T1

Myth-Busting: The Network Layer

Stuart Kendrick, EMC Isilon

Stuart Kendrick works as a Sustaining Engineer for EMC Isilon. He has worked in software development, help desk, desktop support, system administration, and network support. He spent two decades at the Fred Hutchinson Cancer Research Center in a multi-disciplinary role covering transport, network management, root cause analysis, and Problem Management. He is happiest when correlating packet traces with syslog extracts, writing scripts to query device MIBs, and facilitating hands-on classes at LISA, Cascadia, and Sharkfest.

Chris Shaiman, Watchguard Technologies

Chris Shaiman is the Lead Network Engineer and Security Analyst for WatchGuard Technologies. Previously, he worked at Cingular Wireless and Xerox. Specializing in Networking and Security, he recently acquired his Certified Information Systems Security Professional certificate and has been enjoying building a more robust security program at his current employer. When not focusing on his love for networking and security related sciences, he is working on completing his hot yoga teacher training, and bioponic research.

9:00 am–5:00 pm

Willow Room A

Our industry has its share of urban legends. In this tutorial, we split into teams, pick an adventure, and practice a methodology to separate myth from reality. We then come together as a class, listen to each other's reports, discuss what is surprising and how the underlying technology actually works. By the end of the day, you've repeated this cycle several times, have experience employing the methodology, and are ready to apply these techniques back at the office. You bring basic equipment and complete your homework prior to class; we provide test environments and coaching.

Who should attend:

Sys admins and network engineers involved in designing operational IT environments or trouble-shooting client/server problems … and who enjoy discovering that sometimes what we believe just ain’t so.

Take back to work:

Practice using a methodology for sanity-checking performance claims, a deeper understanding of popular technologies, and experience working in ad-hoc teams.

Topics include:

Your team will pick several adventues during the course of the day:

Jumbo Frames: Do they make your server go faster
Packet loss: When does it start to matter
Hops: How many switches does it take to slow down your server
Contention: When the pipe is full, how much of a difference does it make
Firewalls: The bad boys of the network -- how much do they skim off the top
WAN Compression: How much does this buy us

Additional Materials:

Back to Top

Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

Read more about Branson Matheson, Blackphone

T2

Defending Against the Dark Arts

Branson Matheson, Blackphone

Branson is a 27-year veteran of system administration and security. He started in cryptology for the Navy and has since worked on NASA shuttle projects, TSA security systems, search engines and supports many open-source projects. He founded sandSecurity to provide policy and technical audits, support and training for IT security and system administrators. Branson is currently the Director of IT for Blackphone, and generally likes to spend time responding to the statement "I bet you can't…"

9:00 am–5:00 pm

Grand Ballroom C

Today's threats to the enterprise are manifested in many ways but all share similar traits: highly intelligent, well-funded and determined to gain access. In this class, we will explore the murky world of the black-hats. We will examine your security foot-print as they view it, and discuss ways to minimize it, various vectors for attack, and how to detect and defend. We will spend time talking about current threats, and how they can impact your company, and we will build upon the foundations of good security practice. As with all my classes, this will be accompanied with a pinch of humor and a large dollop of common sense.

Who should attend:

Participants should be beginning to mid-level system administrators of any stripe with an interest in IT Security and a desire to understand their potential adversaries. It is suggested that participants have experience with *nix command line and virtual hosts. This course is a continuation of, and will build on, what is taught in Hands On Security for Sysadmins. However, attending that course is not a prerequisite.

Take back to work:

Tools, tips, tricks and a working security toolkit which can be implemented to improve monitoring, detection and defense in your organization. Experience working with (mostly) free security software tools.

Topics include:

Understanding an attack from beginning to end
Security Podiatry Social Engineering detection and prevention
Detecting undesirable network and server behavior
Detecting penetration attempts and successes, and stopping them
Raising awareness

Additional Materials:

Back to Top

Half Day Morning

Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

Read more about Saurav Das, SDN Academy

Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

Read more about Aseem Parikh, SDN Academy

T3

SDN Essentials I

Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

9:00 am–12:30 pm

Grand Ballroom B

This course will introduce the attendee to a disruptive transformation underway in networking. S/he will learn the fundamentals of Software-Defined Networking (SDN) including its motivation, basic building blocks and standard abstractions. Bring your laptop to participate in exercises.

Who should attend:

Any attendee of LISA’14 who is interested in learning about how SDN and OpenFlow present major transformative opportunities to networking professionals.

Take back to work:

The fundamental understanding of Software-Defined Networking (SDN) including its motivation, basic building blocks and standard abstractions.

Topics include:

Introduction to SDN
What is SDN?
SDN Architecture
Benefits of SDN
The SDN Stack
SDN Use Cases and Early Deployments
Wrap-Up: Thinking Differently about Networking

Additional Materials:

Back to Top

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

Read more about David N. Blank-Edelman, Northeastern University College of Computer and Information Science

T4

Build a Sysadmin Sandbox

David N. Blank-Edelman, Northeastern University College of Computer and Information Science

David is the Director of Technology at the Northeastern University College of Computer and Information Science and the author of the O'Reilly book Automating System Administration with Perl.

David has spent the last 27+ years as a system/network administrator in large multi-platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was the program chair of the LISA '05 conference and one of the LISA '06 Invited Talks co-chairs. David is also the recipient of the 2009 SAGE Outstanding Achievement award and is honored to serve on the USENIX Board of Directors.

9:00 am–12:30 pm

Grand Ballroom A

Every sysadmin I know has at least four new technologies they would like to play with—and that's just before breakfast. But finding the time to do this can be difficult, often because of the effort it takes to find a machine, set up the network connection, firewall it from the other machines, and so on... If you want to try something that requires multiple components, that's even worse.

What you need is a sandbox: a place where you can experiment with new technologies, mess with cool tools, test the latest versions of software, prototype new services, and so on, without any sort of heavyweight effort.

Who should attend:

Sysadmins who want to test technologies safely and efficiently.

Take back to work:

The ability to build your own sysadmin sandboxes.

Topics include:

Using software and services like Git, Vagrant, Amazon EC2, and Docker to make sandbox construction easy and free/inexpensive
When a sandbox approach is and is not appropriate
Migrating your work in and out of sandboxes
How to share the work you've done in a sandbox with others

Additional Materials:

Back to Top

Carolyn Rowland, National Institute of Standards and Technology (NIST)

Carolyn Rowland began her UNIX system administration career in 1991 and currently leads an ops/dev team at the National Institute of Standards and Technology (NIST). She credits her success with being able to be the bridge between senior management and technology. Her team has distinguished itself as a leader in the development of new technology solutions that solve business and research problems within the Engineering Laboratory and across the NIST campus.

Read more about Carolyn Rowland, National Institute of Standards and Technology (NIST)

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

Read more about Nicole Forsgren Velasquez, Utah State University

T5

Navigating the Business World

Carolyn Rowland, National Institute of Standards and Technology (NIST)

Carolyn Rowland began her UNIX system administration career in 1991 and currently leads an ops/dev team at the National Institute of Standards and Technology (NIST). She credits her success with being able to be the bridge between senior management and technology. Her team has distinguished itself as a leader in the development of new technology solutions that solve business and research problems within the Engineering Laboratory and across the NIST campus.

Nicole Forsgren Velasquez, Utah State University

Nicole Forsgren Velasquez is considered an expert in the work, tools, knowledge sharing, and communication of technical professionals. Her background spans user experience, enterprise storage, cost allocation, and systems design and development. She has worked with large and small corporations across many industries and government. She holds a Ph.D. in Management Information Systems and a Masters in Accounting from the University of Arizona. She is currently an Assistant Professor at Utah State University.

9:00 am–12:30 pm

Grand Ballroom D

This tutorial provides guidelines and suggestions to improve the communication and collaboration of sysadmins and directly increase effectiveness when working with senior management.

We will provide you with skills that are associated with senior IT operations staff. The focus is on understanding the gap between IT operations and management and how to bridge that gap to increase respect for IT operations and improve your relationship with management. We will cover identifying your role within the organization, effective communication with upper management, positioning yourself to increase your visibility, and becoming the go-to person/group for IT within your organization. The instructors make this an interactive class by buliding on real situations of attendees, including specific ideas to help address immediate issues.

Who should attend:

Technical staff who want or need to work more effectively with senior management.

Take back to work:

The ability to position IT operations to be a valuable partner in your organization. Expect to think differently about IT operations and business when you leave the class.

Topics include:

The divide between management and IT operations
Challenges that impact IT operations teams
Ways to change the perception of IT operations in your organization
The importance of culture
How DevOps lessons translate to business

Additional Materials:

Back to Top

Half Day Afternoon

Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

Read more about Saurav Das, SDN Academy

Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

Read more about Aseem Parikh, SDN Academy

T6

SDN Essentials II

Saurav Das, SDN Academy

Saurav Das has been involved with SDN and OpenFlow since 2008, as part of his PhD work at Stanford. After graduating, he spent two years in the engineering team at Big Switch Networks, working on controller platform scalability in large data center networks. Before Stanford, he spent several years in the networking industry designing sub-systems at Enablence. He holds a PhD in Electrical Engineering from Stanford University, and an MS from the University of Arizona, Tucson.

Aseem Parikh, SDN Academy

Prior to founding the SDN Academy, Aseem served as senior vice president at Guavus, where he was instrumental in product definition, roadmaps and pre- and post-sales delivery, at RMI Corp (acquired by NetLogic), and in Cisco Systems’ Optical Networking Group. He came to Cisco via the acquisition of iManage/Pipelinks, where he served as the co-founder and vice president of engineering. In the past, he has also held senior positions at Novell. Aseem earned an M.S.E.E. from UCLA.

1:30 pm–5:00 pm

Grand Ballroom B

This course will build on the morning tutorial, and use hands-on exercises to delve into the details of the various components of the SDN stack and OpenFlow API. Bring your laptop to participate in exercises.

Who should attend:

DevOps Engineer, Network Admin, Network Planners, Architect, and Software Engineer, who desire a deeper dive into the "how" of SDN.

Take back to work:

Experience with tools and simulation environment to start exploring SDN technologies.

Topics include:

Review of SDN and Architecture
Components of SDN Stack
Intro to OpenFlow
Play with OpenFlow (Hands-on exercise)
The SDN Stack: Switches
The SDN Stack: Network Operating System/Controllers
Build a Learning Switch (Hands-on exercise)
Network Functions Virtualization (NFV)
SDN - A Paradigm Shift

Additional Materials:

Back to Top

Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

Read more about Chris "Mac" McEniry, Sony Network Entertainment

T7

A Quick Introduction to System Tools Programming with Go

Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

1:30 pm–5:00 pm

Grand Ballroom D

This tutorial provides an introduction to Go with a focus on using it for everyday tooling.

Who should attend:

Sysadmins who want to develop their own tooling, or have a general interest in the Go programming language.

Take back to work:

Ability to read and write the Go language. Create, understand, and extend tools written in Go.

Topics include:

Understanding of the applicability of Go to everyday work
The Go environment and toolset
Go language fundamentals:
Control Structures
Functions
Data Types
Methods and Interfaces
Goroutines
Channels

Additional Materials:

Back to Top

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

Read more about Thomas A. Limoncelli, Stack Exchange, Inc.

T8

How To Not Get Paged: Managing Oncall to Reduce Outages

Thomas A. Limoncelli, Stack Exchange, Inc.

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator. His best known books include Time Management for System Administrators (OReilly) and The Practice of System and Network Administration (Addison-Wesley). He works in New York City at Stack Exchange, home of ServerFault.com and StackOverflow.com. Previously he’s worked at small and large companies including Google and Bell Labs. http://EverythingSysadmin.com is his blog. His new book, “The Practice of Cloud System Administration” has just been released.

1:30 pm–5:00 pm

Grand Ballroom A

People think of “oncall” as responding to a pager that beeps because of an outage. In this class you will learn how to use oncall as a vehicle to improve system reliability so that you get paged less often.

This talk includes never-before seen material from the new book, “The Practice of Cloud System Administration” by Limoncelli, Chalup, Hogan.

Who should attend:

Anyone with an oncall responsibility (or their manager).

Take back to work:

How to monitor more accurately so you get paged less
How to design an oncall schedule so that it is more fair and less stressful
How to assure preventative work and long-term solutions get done between oncall shifts
How to conduct “Fire Drills” and “Game Day Exercises” to create antifragile systems
How to write a good Post-mortem document that communicates better and prevents future problems

Topics include:

Why your monitoring strategy is broken and how to fix it
Building a more fair oncall schedule
Monitoring to detect outages vs. monitoring to improve reliability
Alert review strategies
Conducting “Fire Drills” and “Game Day Exercises”
"Blameless Post-mortem documents"

Additional Materials:

Back to Top

Monday, February 16, 2015

Half Day Morning

Eno Thereska, Microsoft Research

Eno Thereska is a Researcher at Microsoft Research in Cambridge, UK. He has broad interests in computer systems. He has over 30 academic publications in top conferences in the field of storage systems and operating systems, including FAST, OSDI, SOSP, SIGMETRICS and CHI. He served as technical co-chair of the File and Storage Systems Conference (FAST '14). Eno is a recipient of the 2014 IEEE William R. Bennett Prize, recipient of the IEEE Infocomm 2011 Best Paper award, and recipient of the USENIX FAST Best Paper and Best Student Paper awards in 2005 and 2004 respectively. He graduated with a Ph.D. from Carnegie Mellon University in 2007.

Read more about Eno Thereska, Microsoft Research

Greg O'Shea, Microsoft Research

Greg O'Shea is a software engineer in the Systems and Networking group at Microsoft Research, Cambridge, UK. He has worked extensively in developing and evaluating experimental network and storage systems and has published his findings extensively in SIGCOMM, NSDI, MobiCom, OSDI, and SOSP. Greg’s work has been incorporated into several Microsoft products such as Windows, Hyper-V, and Windows Server. His latest work is on Storage Quality of Service and is included in Windows Server Technical Preview. He has also developed the Microsoft Research Storage Toolkit, a development kit for software-defined storage. Greg has a Ph.D. from London University.

Read more about Greg O'Shea, Microsoft Research

M1

Software-defined Storage from an SDN Perspective

Eno Thereska, Microsoft Research

Eno Thereska is a Researcher at Microsoft Research in Cambridge, UK. He has broad interests in computer systems. He has over 30 academic publications in top conferences in the field of storage systems and operating systems, including FAST, OSDI, SOSP, SIGMETRICS and CHI. He served as technical co-chair of the File and Storage Systems Conference (FAST '14). Eno is a recipient of the 2014 IEEE William R. Bennett Prize, recipient of the IEEE Infocomm 2011 Best Paper award, and recipient of the USENIX FAST Best Paper and Best Student Paper awards in 2005 and 2004 respectively. He graduated with a Ph.D. from Carnegie Mellon University in 2007.

Greg O'Shea, Microsoft Research

Greg O'Shea is a software engineer in the Systems and Networking group at Microsoft Research, Cambridge, UK. He has worked extensively in developing and evaluating experimental network and storage systems and has published his findings extensively in SIGCOMM, NSDI, MobiCom, OSDI, and SOSP. Greg’s work has been incorporated into several Microsoft products such as Windows, Hyper-V, and Windows Server. His latest work is on Storage Quality of Service and is included in Windows Server Technical Preview. He has also developed the Microsoft Research Storage Toolkit, a development kit for software-defined storage. Greg has a Ph.D. from London University.

9:00 am–12:30 pm

Grand Ballroom B

This tutorial will provide technical background on the (often-vague) concept of software-defined storage (SDS). The technical contribution of this tutorial is around a definition of SDS that builds on recent work in network systems and applies it to storage. This work includes basic concepts such as classification, routing and forwarding, and the separation of control, and data planes. Surprisingly, these basic concepts do not apply well to the storage stack today, making it difficult to enforce end-to-end storage policies.

There will be a short, hands-on exercise that requires Windows 8.1.

Download the presentation slides (PPTX) for this tutorial.

Additional Materials:

Back to Top

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

Read more about Brent Welch, Google

M2

Clustered and Parallel Storage System Technologies

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

9:00 am–12:30 pm

Grand Ballroom C

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Lustre, GPFS, PanFS, HDFS (Hadoop File System), OpenStack, and the NFSv4.1 standard for parallel I/O.

Additional Materials:

Back to Top

Half Day Afternoon

Sam H. Noh, Hongik University

Sam H.(Hyuk) Noh received his B.S. in Computer Engineering from the Seoul National University in 1986, and his Ph.D. from the Department of Computer Science, University of Maryland, College Park in 1993. He has been a professor at the School of Computer and Information Engineering at Hongik University in Seoul, Korea since 1994. He has worked on various software issues pertaining to flash memory since 1999, having authored numerous papers and holding numerous patents in that area. He has served as General Chair, Program Chair, and Program Committee Member for a number of technical conferences and workshops including the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), IEEE International Conference on Parallel and Distributed Systems (ICPADS), USENIX Conference on File and Storage Technologies (FAST), and International World Wide Web (WWW) Conference. He also serves as Associate Editor of ACM Transactions on Storage. His other current research interests include operating system issues pertaining to non-volatile memory, such as PCM and STT-MRAM.

Read more about Sam H. Noh, Hongik University

Yang-Suk Kee, Samsung Semiconductor, Inc.

Dr. Yang-Suk Kee (Yang Seok Ki) is a director and architect of Memory Solutions Lab, Samsung Semiconductor Inc. America. He leads the Advanced Datacenter Solutions group whose main focus is to innovate the SSD ecosystem, and drives astorage-centric computing paradigm called Smart SSD. Before joining Samsung, he worked for Oracle server technology group that builds a distributed database server system, and contributed to Oracle 12c release. Prior to his industrial experience, he worked on HPDC (High Performance Distributed Computing), Grid, and Cloud research in Information Sciences Institute of University of Southern California and Center of Networked Systems, University of California, San Diego. He received his Ph.D. degree of Electrical Engineering and Computer Science in parallel processing, his M.S. degree of Computer Engineering, and B.S. degree of Computer Engineering from Seoul National University, Korea.

Read more about Yang-Suk Kee, Samsung Semiconductor, Inc.

M3

Flash Memory and Its By-products: A to Z in a Flash

Sam H. Noh, Hongik University

Sam H.(Hyuk) Noh received his B.S. in Computer Engineering from the Seoul National University in 1986, and his Ph.D. from the Department of Computer Science, University of Maryland, College Park in 1993. He has been a professor at the School of Computer and Information Engineering at Hongik University in Seoul, Korea since 1994. He has worked on various software issues pertaining to flash memory since 1999, having authored numerous papers and holding numerous patents in that area. He has served as General Chair, Program Chair, and Program Committee Member for a number of technical conferences and workshops including the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), IEEE International Conference on Parallel and Distributed Systems (ICPADS), USENIX Conference on File and Storage Technologies (FAST), and International World Wide Web (WWW) Conference. He also serves as Associate Editor of ACM Transactions on Storage. His other current research interests include operating system issues pertaining to non-volatile memory, such as PCM and STT-MRAM.

Yang-Suk Kee, Samsung Semiconductor, Inc.

Dr. Yang-Suk Kee (Yang Seok Ki) is a director and architect of Memory Solutions Lab, Samsung Semiconductor Inc. America. He leads the Advanced Datacenter Solutions group whose main focus is to innovate the SSD ecosystem, and drives astorage-centric computing paradigm called Smart SSD. Before joining Samsung, he worked for Oracle server technology group that builds a distributed database server system, and contributed to Oracle 12c release. Prior to his industrial experience, he worked on HPDC (High Performance Distributed Computing), Grid, and Cloud research in Information Sciences Institute of University of Southern California and Center of Networked Systems, University of California, San Diego. He received his Ph.D. degree of Electrical Engineering and Computer Science in parallel processing, his M.S. degree of Computer Engineering, and B.S. degree of Computer Engineering from Seoul National University, Korea.

1:30 pm–5:00 pm

Grand Ballroom B

This tutorial will be a crash course on flash memory. We will cover the major ground related to flash memory-based products, starting from the intrinsic characteristics of flash memory devices, moving up to the FTL firmware that controls the flash memory devices, and then finally up to the system software layer that makes use of these flash memory-based end products. We start off covering the history and the very basics of each layer. We then discuss the recent trends that are happening in each of the layers. We will also discuss how each of the layers differ for the various flash products that are commercially available. We will also attempt to untangle the close-knit relationship among the system, software, interface, and the market that together results in the flash memory-based end products and the software systems that make use of these end products.

Additional Materials:

Back to Top

Ryan Blue, Cloudera

Read more about Ryan Blue, Cloudera

M4

Architecting Applications on Hadoop

Ryan Blue, Cloudera

1:30 pm–5:00 pm

Grand Ballroom C

During the first half of the tutorial, we will provide an intro to Apache Hadoop and the ecosystem. In the second half, we will show, using an end-to-end application of clickstream analytics, how users can:

Model data in Hadoop, select optimal storage formats for data stored in Hadoop
Move data between Hadoop and external systems such as relational databases and logs
Access and process data in Hadoop
Orchestrate and scheduling workflows on Hadoop

Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered.

Students should bring laptops with a copy of the of the Cloudera Quickstart VM (or access to a working alternate VM or Hadoop cluster). The VM can be downloaded from here.

Requirements are:
These are a 64-bit VMs. They requires a 64-bit host OS and a virtualization product that can support a 64-bit guest OS.
To use a VMware VM, you must use a player compatible with WorkStation 8.x or higher: Player 4.x or higher, ESXi 5.x or higher, or Fusion 4.x or higher. Older versions of WorkStation can be used to create a new VM using the same virtual disk (VMDK file), but some features in VMware Tools won't be available.

The VM and file size vary according to the CDH version as follows:

CDH and Cloudera Manager Version	RAM Required by VM	File Size
CDH 5 and Cloudera Manager 5	4 GB	3 GB
CDH 4, Cloudera Impala, Cloudera Search, and Cloudera Manager 4	4 GB	2 GB

Additional Materials:

Back to Top

Sunday, November 8, 2015

Full Day

Salim Virji, Google

Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and various pieces of glue to connect them all. He lives and works in New York City.

Read more about Salim Virji, Google

S1

SRE Classroom

Salim Virji, Google

Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and various pieces of glue to connect them all. He lives and works in New York City.

9:00 am–5:00 pm

Thurgood Marshall West

With this hands-on tutorial, you will develop an understanding for designing, building, and running reliable Internet services at a large scale.

Who should attend:

This tutorial is suitable for executives who need to specify and evaluate systems, engineers who build systems, and IT professionals who want to run first-class services built with reliable systems.

Take back to work:

You will take back an understanding of how to evaluate system designs; how to specify and build large systems; and how to operate these systems in the real world, in a way that will scale as the system grows.

Topics include:

Designing Reliable Systems
Building Reliable Systems
Running Reliable Systems

Additional Materials:

You may want a text editor and a calculator, but most of this work can be done with pencil and paper.

Back to Top

Tyler Fitch, Chef

Tyler is a member of Chef’s Customer Engineering team, championing successful patterns and delightful experiences in automation to enterprise customers large and small. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

@tfitch

Read more about Tyler Fitch, Chef

S3

Automation Tools Bootcamp

Tyler Fitch, Chef

Tyler is a member of Chef’s Customer Engineering team, championing successful patterns and delightful experiences in automation to enterprise customers large and small. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

9:00 am–5:00 pm

Thurgood Marshall South

The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new development tools. We will learn about and then use Git, Vagrant, Chef, Packer, Docker, and Jenkins to build a small application in local VMs.

We will cover a progression of tasks, leveraging information from previous sections to build a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.

Who should attend:

Operations, QA, developers, those who choose to call themselves DevOps, and even managers can come learn.

Take back to work:

These automation tools are freely available to engineers, enabling them to safely break environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.

Topics include:

Git
Vagrant
Chef
Packer
Docker
Jenkins

Additional Materials:

Having Vagrant and VirtualBox installed will allow students to use a VM that will be provided. The VM will have everything else pre-installed. The VM will be available for download before or after the class. URL TBA.

Back to Top

Half Day Morning

Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

Read more about Chris "Mac" McEniry, Sony Network Entertainment

S2

Go for Sysadmins

Chris "Mac" McEniry, Sony Network Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

9:00 am–12:30 pm

Thurgood Marshall North

Every day, sysadmins find the need to build tools that get the job done in a fast and efficient way. Go is a new and powerful—yet simple—language that is an excellent fit for the needs of the harried sysadmin. This tutorial provides an introduction to Go with a focus on using it for everday tooling.

Who should attend:

Sysadmins interested in the Go language for applications and tooling. Some experience with programming or scripting languages such as Perl, Python, or Ruby.

Take back to work:

An introductory understanding of the Go language, and several examples of common problems to learn from.

Topics include:

Understanding of the applicability of Go to everyday work
The Go environment and toolset
Go language fundamentals:
Control Structures
Functions
Data Types
Methods and Interfaces
Goroutines
Channels
Examples in Go:
Email
Web Server
Directory Tools

Additional Materials:

A current Linux VM installed with golang from is recommended. See http://golang.org/doc/install. However, exercises can be worked on by using the packages from this site directly.

Back to Top

Rajesh Joseph, Red Hat, Inc.

Rajesh Joseph is a Principal Software Engineer at Red Hat. He is currently working with the Protocols Team for Red Hat Gluster Storage (RHGS) that is responsible for implementation of the SMB and NFS family of protocols for RHGS. He is also the maintainer of the Snapshot feature in GlusterFS. He has 13 years of industry experience in information security, cryptography, storage, and file-system domain. Prior to Red Hat, he worked with Cisco and Nokia.

Read more about Rajesh Joseph, Red Hat, Inc.

Poornima Gurusiddaiah, Red Hat, Inc.

Poornima Gurusiddaiah works as a software engineer at Red Hat. With two years of development experience with Gluster, she has contributed to various Gluster projects including Samba Gluster integration, libgfapi, and multiprotocol for GlusterFS. Prior to Red Hat, she worked with HP on an openVMS kernel project. Poornima's areas of interest include Gluster, storage, Samba and related technologies.

Read more about Poornima Gurusiddaiah, Red Hat, Inc.

S4

GlusterFS: A Scale-out Software Defined Storage

Rajesh Joseph, Red Hat, Inc.

Rajesh Joseph is a Principal Software Engineer at Red Hat. He is currently working with the Protocols Team for Red Hat Gluster Storage (RHGS) that is responsible for implementation of the SMB and NFS family of protocols for RHGS. He is also the maintainer of the Snapshot feature in GlusterFS. He has 13 years of industry experience in information security, cryptography, storage, and file-system domain. Prior to Red Hat, he worked with Cisco and Nokia.

Poornima Gurusiddaiah, Red Hat, Inc.

Poornima Gurusiddaiah works as a software engineer at Red Hat. With two years of development experience with Gluster, she has contributed to various Gluster projects including Samba Gluster integration, libgfapi, and multiprotocol for GlusterFS. Prior to Red Hat, she worked with HP on an openVMS kernel project. Poornima's areas of interest include Gluster, storage, Samba and related technologies.

9:00 am–12:30 pm

Thurgood Marshall East

Learn about GlusterFS, a scale-out, software-defined storage platform capable of scaling to many petabytes. GlusterFS aggregates storage building blocks on diverse commodity hardware over InfiniBand RDMA or TCP/IP interconnect in a single global namespace. GlusterFS is based on a stackable user space design and can deliver exceptional flexibility and performance for diverse workloads.

This tutorial will combine lecture with instructor-led demonstrations using GlusterFS.

Who should attend:

Any system/storage administrator who wants to get in-depth understanding of GlusterFS.
Anyone who wants to know how GlusterFS can help their storage infrastructure and contain costs.

Take back to work:

Thorough understanding of GlusterFS
Virtual Machine images for playing with GlusterFS
Lab instructions on how to set up and play with a test cluster

Topics include:

Overview of the GlusterFS architecture
Features offered by GlusterFS
Different access methods to the GlutsterFS (Samba, NFS, FUSE)
Hands-on using GlusterFS

Additional Materials:

Back to Top

Lee Damon, University of Washington

Lee Damon chaired the SAGE policies and ethics working groups that developed the original SAGE/LOPSA code of ethics. He has a B.S. in Speech Communication from Oregon State University. A UNIX system administrator since 1985, he has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and Qualcomm. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. He chaired LISA '04 and co-chaired CasITconf '11, '13, and '14.

Read more about Lee Damon, University of Washington

S5

Professional Conduct and Computer Ethics

Lee Damon, University of Washington

Lee Damon chaired the SAGE policies and ethics working groups that developed the original SAGE/LOPSA code of ethics. He has a B.S. in Speech Communication from Oregon State University. A UNIX system administrator since 1985, he has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and Qualcomm. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. He chaired LISA '04 and co-chaired CasITconf '11, '13, and '14.

9:00 am–12:30 pm

Lincoln 4

This introductory tutorial will start by examining some of the ethical responsibilities that come along with access to other users' data, accounts, and confidential information. We will look at several case studies involving both local and Cloud usage. All attendees are strongly encouraged to participate in the discussion. Numerous viewpoints will be considered in order to give students a perspective from which to develop their own reasoned response to ethical challenges.

The answers to these and other problems are often far more complicated than one would initially guess.

There are no prerequisites for this class.

Who should attend:

People in any of the following situations would benefit from this tutorial:

system administrators
managers of system administrators
those who have access to personal/confidential information
those who make policy decisions about computer systems and their users.

Take back to work:

After completing this tutorial, students will be better able to resolve questionable situations and will have the means to support their decisions.

Topics include:

Why it is important to set your ethical standards before issues arise
Who is impacted by "expectations of ethical conduct"
Why this isn't just an expectation of system administrators
Implicit expectations of ethical behavior
Ethics and the Cloud
Coercion to violate ethics
Well-intentioned violations of privacy
Collection, retention, and protection of personal data
Management directives vs. friendships
Software piracy/copying in a company/group/department

Additional Materials:

Back to Top

Half Day Afternoon

David Josephsen, Librato

As the developer evangelist for Librato, Dave Josephsen hacks on tools and documentation, writes about statistics, systems monitoring, alerting, metrics collection and visualization, and generally does anything he can to help engineers and developers close the feedback loop in their systems. He’s written books for Prentice Hall and O’Reilly, speaks Go, C, Python, Perl and a little bit of Spanish, and has never lost a game of Calvinball.

Read more about David Josephsen, Librato

S6

Neighborly Nagios

David Josephsen, Librato

As the developer evangelist for Librato, Dave Josephsen hacks on tools and documentation, writes about statistics, systems monitoring, alerting, metrics collection and visualization, and generally does anything he can to help engineers and developers close the feedback loop in their systems. He’s written books for Prentice Hall and O’Reilly, speaks Go, C, Python, Perl and a little bit of Spanish, and has never lost a game of Calvinball.

1:30 pm–5:00 pm

Thurgood Marshall East

Neighborly Nagios is a beginner-level tutorial on the venerable monitoring system. Rather than teaching Nagios in a vacuum, this tutorial emphasizes the important role played by centralized, polling-based monitoring systems like Nagios in the context of a modern engineering organization. We will focus on the hooks and I/O intricacies systems engineers need to rely on in order to make Nagios play nicely with other commonly used monitoring tools.

Who should attend:

Sysadmins interested in learning about Nagios.

Take back to work:

Attendees will leave with a solid conceptual understanding of Nagios, as well as centralized, polling-based monitoring systems in general. They will have configured Nagios to run local and remote service, and host checks, as well as perform email and webhook notifications. Attendees will also configure Nagios to work with several visualization systems, and to emit logs into a centralized event-logging infrastructure.

Topics include:

Why learn Nagios?

Centralized pollers in theory and practice

Reality through the eyes of Nagios

Hosts, services, and plugins
Server-side vs. client-side plugins and remote execution
Active polling vs. passive checks
People (contacts, contact groups, and notification commands)

Installing and configuration

Nagios configuration layout and templating
Macros and commands
Templates for hosts, services, and contacts
A simple template setup using hostgroups
A flexible template setup
Managing your configs

Data-In: Plugins and data collection frameworks

Three simple, server-side plugins
NRPE
NRDP
Check_MK

Data-Out: Hooks and data export strategies

Performance Data and Time Series
Nagios Event Radio Dispatch (NERD)
PNP4Nagios and RRDTool
Graphios and Graphite
MK LiveStatus

Neighborly Nagios: One part of an effective monitoring infrastructure

Data as a first-class citizen
Composable telemetry processing infrastructure
Intro to Statsd
Intro to Heka
Intro to Reimann

Additional Materials:

Attendees are encouraged to have a laptop with Vagrant installed.

Back to Top

Adam Moskowitz, MathWorks

Adam Moskowitz is a Senior Tools Engineer at MathWorks and was the program chair of the LISA ‘09 conference. Since entering the field in 1978, he has been a computer operator (what sysadmins were called in The Good Old Days), an application developer, a system administrator, and a teacher of all of those things. At MathWorks, he is building a system to manage containers in both production and ephemeral testing environments (and to quietly take over the world). When he’s not writing code, you will find him judging barbecue contests or riding a tandem bicycle with his wife.

@asm2tweet

Read more about Adam Moskowitz, MathWorks

S7

Software Testing for Sysadmin Programs

Adam Moskowitz, MathWorks

Adam Moskowitz is a Senior Tools Engineer at MathWorks and was the program chair of the LISA ‘09 conference. Since entering the field in 1978, he has been a computer operator (what sysadmins were called in The Good Old Days), an application developer, a system administrator, and a teacher of all of those things. At MathWorks, he is building a system to manage containers in both production and ephemeral testing environments (and to quietly take over the world). When he’s not writing code, you will find him judging barbecue contests or riding a tandem bicycle with his wife.

1:30 pm–5:00 pm

Thurgood Marshall North

This tutorial will focus on how to incorporate testing into the software that sysadmins develop. Programs typically written by sysadmins tend to be less amenable to testing for a variety of reasons: choice of language, lack of "OO-ness," and heavy interaction with "system programs" (things like "mount" and "df"). This class will address these issues and show ways to incorporate testing into existing programs without having to rewrite them; it will also demonstrate how to use testing when developing new programs when "going OO" isn't an option. The session will consist of lecture, discussion, and hands-on exercises.

Who should attend:

System administrators of any level with at least some experience writing programs in pretty much any scripting language language (Python, Ruby, Perl, Bash, or Groovy).

Take back to work:

An understanding of software testing and test-driven development, and how to incorporate these concepts into system administration
Suggestions for making future programs more testable
Exposure to a testing framework (Maven plus Spock)
Experience developing basic unit tests
Some exposure to simple "mocking" techniques

Topics include:

Motivation and benefits

Why use testing
Why testing is needed
How it can help
Why it's not as difficult as people think

Basic testing

Intro to Maven (the test "runner")

Intro to Spock (the test framework)

Intro to Groovy (the testing language)

"Mocks" (or "Mocking")

How to simulate interaction with other programs

Additional Materials:

Back to Top

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

@mxciaMi

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

S8

Documentation Techniques for Sysadmins

Mike Ciavarella, Coffee Bean Software Pty Ltd

Mike Ciavarella has been producing and editing technical documentation since he naively agreed to write application manuals for his first employer in the early 1980s. His first UNIX system administration role was on a network of Sun workstations in 1991, where he built his first firewall. Since that time, Mike has made a point of actively promoting documentation and security as fundamental aspects of system administration. He has been a technical editor for Macmillan Press, has lectured on software engineering at the University of Melbourne (his alma mater), and has provided expert testimony in a number of computer security cases.

1:30 pm–5:00 pm

Lincoln 4

Sysadmins freely acknowledge how important documentation is to their daily lives, and in the same sentence will loudly complain that they don’t have time to produce documentation. This class is about how to produce effective, useful and timely documentation as part of your normal sysadmin activities. Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

Who should attend:

System administrators of all types and levels who need to produce documention for the systems they manage, or who want to improve their documentation skills. Documentation can be the difference that turns you from a good sysadmin to a great sysadmin!

Take back to work:

The skills to improve personal and team documentation quality
A solid understanding of how to establish and maintain effective documentation practices

Topics include:

Why system administrators need to document
Documentation as part of your daily workflow
Targeting your audience
Common mistakes made in documentation
Tools to assist the documentation process (including effective use of wikis)

Additional Materials:

Back to Top

Monday, November 9, 2015

Full Day

Caskey L. Dickson

Caskey Dickson is an engineer with an MBA. Currently, he is a Site Reliability Engineer/Software Engineer at Google, where he works writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines.

Caskey has worked in online service development since 1995. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

Read more about Caskey L. Dickson

M1

Everything an Administrator Wanted to Know about Operating System Internals but Was Afraid to Ask

Caskey L. Dickson

Caskey Dickson is an engineer with an MBA. Currently, he is a Site Reliability Engineer/Software Engineer at Google, where he works writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. Caskey has worked in online service development since 1995. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

9:00 am–5:00 pm

Thurgood Marshall South

The Morning Session (Part 1) of this tutorial is FULL.

Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "It's sometimes slow."

This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis. As an added bonus, it will prime you for the following day's "Linux Performance Tuning" tutorial with Theodore Ts'o.

You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.

Buckle up.

Who should attend:

All admins who did not take the Comp-Sci academic route and never had a course in OS internals
Inexperienced admins whose coursework or training didn't include the kind of OS internals that they should (modern OS courses have become a shadow of their former selves and commonly require writing no OS code)
More experienced admins who haven't really had to address these sorts of issues on a regular basis who probably know a lot about some individual aspects but could benefit from everything being put into a broader context

Take back to work:

Attendees will gain a deeper understanding of what goes on inside the kernel and the areas where things can go wrong. We'll explore how little the concept of "system load" captures about the true system state, and attendees will be prepared to improve both their operational response methodologies as well as their monitoring goals.

Topics include:

Morning:

Scheduling and Process Management
Memory Management and the MMU
Virtualization and its impact on these

Afternoon:

File System Architecture (for sysadmins covering ext (2, 3, 4), NTFS, and ZFS)
Storage layer performance, disks, RAID, and SANs
The impact of virtualization on these

Additional Materials:

All attendees should have either a laptop running Linux, or a VM with a recent copy of Linux installed to assist in the hands-on exploration activities.

Back to Top

Branson Matheson, sandSecurity

Branson is a 27-year veteran of system architecture, administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines. He has also run his own company while continuing to support many open source projects. He is currently the CIO for Silent Circle, a privacy and security organization behind the Blackphone. Branson has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

@sandinak

Read more about Branson Matheson, sandSecurity

M3

Defending against the Dark Arts

Branson Matheson, sandSecurity

Branson is a 27-year veteran of system architecture, administration and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle projects, TSA security and monitoring systems, and Internet search engines. He has also run his own company while continuing to support many open source projects. He is currently the CIO for Silent Circle, a privacy and security organization behind the Blackphone. Branson has his CEH, GSEC, GCIH, and several other credentials; and generally likes to spend time responding to the statement "I bet you can't."

9:00 am–5:00 pm

Thurgood Marshall North

Today's threats to the enterprise are manifested in many ways, but all share similar traits: highly intelligent, well funded and determined to gain access. In this class, we will explore the murky world of the blackhats. We will examine your security footprint as they view it, and discuss ways to minimize it, various vectors for attack, and how to detect and defend. We will spend time talking about current threats and how they can impact your company, and we will build upon the foundations of good security practice. As with all my classes, this will be accompanied with a pinch of humor and a large dollop of common sense.

Who should attend:

Beginning to high-level system administrators of any stripe with an interest in IT Security and a desire to understand their potential adversaries. It is suggested that participants have experience with *nix command line and virtual hosts.

Take back to work:

Tools, tips, tricks and a working security toolkit that can be implemented to improve monitoring, detection, and defense in your organization. Experience working with (mostly) free security software tools.

Topics include:

Understanding an attack from beginning to end
Security Podiatry Social Engineering detection and prevention
Detecting undesirable network and server behavior
Detecting penetration attempts and successes, and stopping them
Raising awareness

Additional Materials:

VMware Fusion or VirtualBox. You should have wireless or wired connectivity; we will have both available in the class. You will need to be able to acquire the VM for the class prior; registered attendees will receive the download information via email.

Back to Top

Half Day Morning

Robert Ballance, Independent

Dr. Robert Ballance honed his R-programming skills as a System Manager for High-Performance Computing Systems at Sandia National Laboratories, where he developed several R packages for system analysis and reporting. He has administered systems and managed, consulted, taught, and developed software, including R packages, PERL applications, compilers, and UNIX device drivers. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group.

@BobBallance

Read more about Robert Ballance, Independent

M2

An Introduction to R for System Administrators

Robert Ballance, Independent

Dr. Robert Ballance honed his R-programming skills as a System Manager for High-Performance Computing Systems at Sandia National Laboratories, where he developed several R packages for system analysis and reporting. He has administered systems and managed, consulted, taught, and developed software, including R packages, PERL applications, compilers, and UNIX device drivers. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group.

9:00 am–12:30 pm

Thurgood Marshall West

The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for ensuring reproducible and repeatable results.

This tutorial is designed to

motivate you to pick up R,
demonstrate useful techniques using R,
illustrate ways to simplify your life by automating data analysis and reporting

Examples will be based on situations that the instructor encountered during routine system operations. Additional exercises and data sets that students can explore following the workshop will be provided. The instructor will be available in the LISA Lab after the workshop.

Who should attend:

System administrators who are awash in operational data and want to do a more effective job of understanding their data and communicating their findings should attend this class. Prior knowledge of R is not required, but if you are already working with R, you are welcome! Facility with programming and a knowledge of basic descriptive statistics will be assumed.

Take back to work:

This introduction to R and its ecosystem provides a walk along the R main line—coming up to speed on R, accessing data, analyzing data, and getting the message out. The key points include:

Acquaintance with R, R packages, and R Studio
Understanding where R fits into the system administrator’s tool set
Familiarity with basic R data manipulation techniques
Basic principles for ensuring reproducible and automated analyses
Motivation to learn or improve your R skills
Next steps in mastering R

Topics include:

Introduction to the R ecosystem (R, R Studio, CRAN)
Why should you consider R?
The R programming model: functions, tables, and packages
The basic data analysis workflow
Reading and writing data from files and pipes
Data frames and data frame manipulations
Using the plyr and dplyr packages to slice and dice data
Using the ggplot2 package for graphing
Overview of the R package system
Other useful R packages

Additional Materials:

The following software packages should be installed in advance of the tutorial:

R: Version 3 or later from CRAN or as supported by your OS. Some distributions have packages already available; others have both pre-compiled binaries and source code available from CRAN). R requires about 160MB (installed) on Mac OS X. Source code is available. License: GNU General Public License (GPL).

Optional: R Studio. Requires R, plus an additional 305MB on Mac OS X. Binary installations are available for Ubuntu, Fedora, Mac OS X, and Windows. Source code is available. R Studio does not appear to be supported for the BSD distributions. License: GNU Affero General Public License.

Once you have installed R and (optional) R Studio, you can download contributed packages. Class demonstrations will use plyr and ggplot2. Other packages will be introduced as needed.

Back to Top

Todd Palino, Staff Site Reliability Engineer, LinkedIn

Todd is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping the largest deployment of Apache Kafka, Zookeeper, and Samza fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Previously, Todd was a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company.

@bonkoif

Read more about Todd Palino, Staff Site Reliability Engineer, LinkedIn

M4

Apache Kafka from Scratch

Todd Palino, Staff Site Reliability Engineer, LinkedIn

Todd is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping the largest deployment of Apache Kafka, Zookeeper, and Samza fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Previously, Todd was a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company.

9:00 am–12:30 pm

Thurgood Marshall East

Apache Kafka is a publish/subscribe messaging system that is in use within companies including LinkedIn, Twitter, Netflix, and many others. It is used to build Extract, Transform, and Load (ETL) pipelines, collect metrics and logs, and queue data between applications, often providing the main backbone for moving data within big data infrastructures. This tutorial will focus on how to get started with Kafka, including working with ZooKeeper, which it depends on. We will cover installation, configuring retention and replication, and creating simple applications for producing and consuming messages.

Who should attend:

This tutorial is designed for engineers, both operations and development, who are new to Apache Kafka and publish/subscribe messaging. The only prerequisite knowledge for participating is the ability to install software and execute basic shell commands. The ability to write basic Python programs is helpful, but not required. Full working versions of all scripts used in the tutorial will be provided.

Take back to work:

Participants will leave the tutorial with an understanding of how to set up Apache ZooKeeper and Apache Kafka, and create message producers and consumers, having completed this work on their own systems. This will allow them to set up a publish/subscribe messaging infrastructure that can be used for myriad applications, including monitoring, logging, queuing, and tracking user-generated events.

Topics include:

ZooKeeper

What is ZooKeeper?
What is it NOT?
Standalone Setup

Apache Kafka

Publish/Subscribe Messaging
Kafka Architecture
Installing Kafka

Producing Messages

Message Schema
Using the Console Producer
Producing Inside Applications

Consuming Messages

Using the Console Consumer
Consuming Inside Applications
Limitations of Non-Java Consumers

Kafka Clusters

Adding a Second Broker (partner work)
Replicating Partitions
Creating Multiple Partitions

Message Retention

Retention by Size
Retention by Key (log compacted)

Use Cases

Monitoring
Log Collection
User-generated Events
Queuing

Additional Materials:

The exact requirements are to be determined. Each attendee will be required to bring a laptop, preferably Mac or Linux, on which they can install required software, including a Java development kit (if not already installed), Apache ZooKeeper, and Apache Kafka. More detailed instructions will be provided prior to the conference.

Back to Top

Scott Cromar, Author, "From Techie to Boss"

Scott Cromar is an experienced IT manager who still remembers what it was like to step into his first leadership position from a technical role. He has assembled diverse, multifunctional, globally distributed operational teams for several employers over his career, and he enjoys the challenge of creating a team from a group of talented individuals.

@TechieToBoss

Read more about Scott Cromar, Author, "From Techie to Boss"

M5

Survival Guide for the New Manager

Scott Cromar, Author, "From Techie to Boss"

Scott Cromar is an experienced IT manager who still remembers what it was like to step into his first leadership position from a technical role. He has assembled diverse, multifunctional, globally distributed operational teams for several employers over his career, and he enjoys the challenge of creating a team from a group of talented individuals.

9:00 am–12:30 pm

Lincoln 4

You're respected by people inside and outside of your group. When there are hard problems to be resolved, you're the person who makes things work on time and under budget. More than once, you've pulled a complete miracle out of your hat. And when the team needs a new manager or team lead, you're the person at the top of the list.

Unfortunately, the skills that make a good technical staff member do not always translate well to management. How do you make the leap? Scott’s presentation will help you prioritize and organize the tasks that every successful manager has to master.

Who should attend:

Technical people who are assuming (or are interested in assuming) a leadership role in their organizations.

Take back to work:

This class will share some of the tips and skills that I have learned the hard way, by transitioning from a solid performer to a team manager. You will become aware of the many resources available to help you make the transition.

Topics include:

Defeating the "Peter Principle"
Becoming a good leader
The manager’s toolkit
Your 90-day transition plan
Assessing team strengths, weaknesses, and needs
Building your team
Evaluating team members
Managing expectations and relationships
Creating your learning plan
Measuring success
Fitting the right strategy to your situation
Time management
Effective meetings
Managing projects effectively
Conflict management

Additional Materials:

This session includes a number of pen-and-paper exercises.

Back to Top

Half Day Afternoon

Matt Harrison, MetaSnake

Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.

Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

@__mharrison__

Read more about Matt Harrison, MetaSnake

M6

Hands-on Introduction to Python Analytic Stack

Matt Harrison, MetaSnake

Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

1:30 pm–5:00 pm

Thurgood Marshall East

Python is in high demand. In addition to being used purely for development, Python programming is one of the top skills for data scientists because it is a full stack analytics package. You can access data with it (or crawl to gather data), slice it and dice it, throw it into a database, visualize it, and perform machine learning with it.

This course will cover some of the tools that data scientists are using to analyze data. Specifically, we will introduce the IPython Notebook (Jupyter), the pandas toolkit, and the plotting facilities in matplotlib.

Who should attend:

Developers or admins who know Python or another language and want to learn about the analytic stack, specifically iPython Notebook, pandas, and Matplotlib.

Take back to work:

Attendees will return to work with a basic understanding of the Python tools for data analysis.

Topics include:

Anaconda Distribution
IPython Notebook
Navigation in Notebook
Executing code in Notebook
pandas Introduction
Getting data
Cleaning data
Examining data
Filtering, joining and updating data
Working with aggregates
Creating pivot tables
Plotting Introduction
matplotlib architecture
Line plots
Histograms
Box Plots
Tweaking axis, labels, legends

Additional Materials:

Attendees should have the (free) Anaconda stack installed on their machine. This is a large download, so please do it before the class. Downloads for Windows, Mac, and Linux can be found at http://continuum.io/downloads

Back to Top

Alison Chaiken, Mentor Graphics

Alison Chaiken has been working with and contributing to systemd as part of her work on automotive Linux systems programming at Mentor Graphics' Embedded Software Division. She has given several presentations about systemd, most notably at the 2015 meetings of the Southern California Linux Expo and the Embedded Linux Conference.

Read more about Alison Chaiken, Mentor Graphics

M7

systemd, the Next-Generation Linux System Manager

Alison Chaiken, Mentor Graphics

Alison Chaiken has been working with and contributing to systemd as part of her work on automotive Linux systems programming at Mentor Graphics' Embedded Software Division. She has given several presentations about systemd, most notably at the 2015 meetings of the Southern California Linux Expo and the Embedded Linux Conference.

1:30 pm–5:00 pm

Thurgood Marshall West

Systemd is now at the heart of all major Linux distributions. The fundamental design principle is to pull duplicate daemon-management code out of SysVinit's bash scripts and migrate it to systemd's core or the kernel. systemd is modular, asynchronous and fully configurable via ASCII-formatted declarative config files.

Attendees will learn to configure, diagnose and control systemd using its native tools like systemctl and journalctl. Hands-on examples will show systemd's unit files and directory hierarchy, illustrate socket-based activation, and demonstrate management of resources and security. Time permitting, attendees will learn about systemd and containers.

Who should attend:

Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.

Take back to work:

Familiarity with systemd's configuration, tools, and basic principles.

Topics include:

The basic principles of systemd
systemd's major components
Anatomy of a systemd unit file
Understanding and optimizing the boot sequence
Improved system monitoring with journald
Resource management via systemd's cgroups interface
Simple security management with systemd and the kernel's capabilities
systemd and virtualization

Additional Materials:

If network connectivity permits, we will try to offer virtual machine instances with systemd baked in. However, attendees should be able to work the examples on any fully patched, up-to-date Linux system that includes systemd and that can be rebooted. A Linux instance inside a VM will be most convenient in either case, but native installation should also work.

Back to Top

Thomas A. Limoncelli, Stack Overflow

Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

@yesthattom

Read more about Thomas A. Limoncelli, Stack Overflow

M8

Introduction to Time Management for Busy Devs and Ops

Thomas A. Limoncelli, Stack Overflow

Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

1:30 pm–5:00 pm

Lincoln 4

Devs and Ops have a time management problem: There are too many projects. Too many interruptions. Too many distractions. This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.

Who should attend:

Sysadmins, devs, operations, and their managers

Take back to work:

By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).

Topics include:

Why typical “time management” strategies don’t work for sysadmins
What makes “to-do” lists fail, and how to make them work
How to eliminate “I forgot” from your vocabulary
How to manage interruptions: preventing them, managing the ones you get
Delegating to coworkers without them knowing
Achieving perfect follow-through
The Cycle System for recording and processing to-do lists
Prioritization techniques
Task grouping: batching, sharding, and multitasking
Handling situations like a big outage disrupting your perfectly planned day

Additional Materials:

Back to Top

Tuesday, November 10, 2015

Full Day

Theodore Ts'o, Google

Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

@tytso

Read more about Theodore Ts'o, Google

T1

Linux Performance Tuning

Theodore Ts'o, Google

Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

9:00 am–5:00 pm

Thurgood Marshall South

The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workload on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system as well as key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:

Strategies for performance tuning
Characterizing your workload's requirements
Finding bottlenecks
Tools for measuring system performance
Memory usage tuning
File system and storage tuning
Network tuning
Latency vs. throughput
Capacity planning
Profiling
Memory cache and
Application tuning strategiesTLB tuning

Additional Materials:

Back to Top

Matt Harrison, MetaSnake

Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.

Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

@__mharrison__

Read more about Matt Harrison, MetaSnake

T2

Hands-on Intermediate Python

Matt Harrison, MetaSnake

Matt Harrison is a consultant and corporate trainer at MetaSnake, focusing on Python and Data Science. He has been using Python since 2000 across the domains of search, build management and testing, business intelligence, and storage.Matt also runs pycast.io, a screencasting service providing instruction on Python and Data Science. He occasionally tweets useful Python-related information at @__mharrison__.

9:00 am–5:00 pm

Thurgood Marshall East

Got the basics of Python down but want to dig in a little more?

Have you wondered about functional programming, closures, decorators, context managers, generators, or list comprehensions, when you should use them, and how to test them? This hands-on tutorial will cover these intermediate subjects in detail.

Who should attend:

Developers with a basic understanding of Python who want to learn more advanced features

Take back to work:

You will go back to work with a knowledge of advanced Python constructs and idioms, and know when to use them.

Topics include:

Closures
Decorators
Class decorators
Properties
Context managers
List comprehensions
Iterator pattern
Generators

Additional Materials:

Please install Python (2 or 3) on your laptop. Most Linux and Mac computers already have Python installed. Windows users can find a download at python.org.

Back to Top

Nick Feamster, Princeton University

Nick Feamster is a professor in the Computer Science Department at Princeton University. Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

@feamster

Read more about Nick Feamster, Princeton University

T3

Software Defined Networking: Principles and Practice

Nick Feamster, Princeton University

Nick Feamster is a professor in the Computer Science Department at Princeton University. Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

9:00 am–5:00 pm

Thurgood Marshall North

This course introduces software defined networking, an emerging paradigm in computer networking that allows a logically centralized software program to control the behavior of an entire network.

Separating a network's control logic from the underlying physical routers and switches that forward traffic allows network operators to write high-level control programs that specify the behavior of an entire network. This is in contrast to conventional networks, where network operators must codify functionality in terms of low-level device configuration.

Logically centralized network control makes it possible for operators to specify more complex tasks that involve integrating many disjoint network functions (e.g., security, resource control, prioritization) into a single control framework. This allows network operators to create more sophisticated policies and makes network configurations easier to configure, manage, troubleshoot, and debug.

Who should attend:

Network operators, software developers, graduate students, and other professionals who want to gain hands-on experience with SDN.

Take back to work:

Knowledge about how to program modern SDN controllers for specific use cases, complete with examples. We'll provide a course virtual machine, so—obviously—that is for keeps!

Topics include:

History and evolution of SDN
Control and data plane separation
Control Plane and Modern Controllers (Ryu, ONOS)
Network Virtualization (Docker)
Data Plane (P4)
Programming SDNs (Frenetic)
Verification and Debugging
Use Cases and Looking Forward (SDX, security, etc.)

Additional Materials:

Our Vagrant installation may be updated before October, but the current version is available at https://github.com/PrincetonUniversity/Coursera-SDN

Back to Top

Half Day Morning

Jerome Petazzoni, Docker Inc.

Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

@jpetazzo

Read more about Jerome Petazzoni, Docker Inc.

T4

Introduction to Docker and Containers

Jerome Petazzoni, Docker Inc.

Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

9:00 am–12:30 pm

Thurgood Marshall West

Docker is an open platform to build, ship, and run any Linux application, anywhere. It can be used in many ways: providing clean, isolated development environments; quickly spinning up test instances for CI purposes; ensuring coherence between development and production platform; and much more.

This hands-on tutorial will give you an opportunity to dive in and see Docker in action. You will learn about Docker basic concepts, how to run containers, create your own images, interact with the "Docker Hub," and stack multiple containers to compose complex applications.

Who should attend:

This tutorial is relevant for both developers and sysadmins. If you have heard about Docker, but you haven't used it (or very little), this will get you started with a fast-paced, hands-on introduction.

No previous knowledge of Docker or containers is required, but you will need some basic UNIX command-line skills.

Take back to work:

You will interact with a real Docker host and manipulate containers and images. After the tutorial, you will be able to use Docker in simple scenarios, and you will know where to look for further information to cover advanced use cases like orchestration, dynamic resource scheduling, load balancing, and more.

Topics include:

What's Docker?
Running our first container
Understanding Docker images
Running a network service in a container
Building images interactively
Building images with Dockerfile
Pushing and pulling images
Development workflow with Docker
Network and volumes
Stacking complex apps with Compose

Additional Materials:

You won't need to pre-install Docker before the course; each student will be given credentials to connect to an individual virtual machine. All you need is a computer with a SSH client.

Back to Top

Daniel Walsh, Red Hat, Inc.

Daniel Walsh has worked in the computer security field for over 30 years. Dan joined Red Hat in August 2001. Dan leads the RHEL Docker enablement team since August 2013, but has been working on container technology for several years. He has led the SELinux project, concentrating on the application space and policy development. Dan helped developed sVirt, Secure Virtualization. He also created the SELinux Sandbox, the Xguest user and the Secure Kiosk. Previously, Dan worked Netect/Bindview's on Vulnerability Assessment Products and at Digital Equipment Corporation working on the Athena Project, AltaVista Firewall/Tunnel (VPN) Products. Dan has a BA in Mathematics from the College of the Holy Cross and a MS in Computer Science from Worcester Polytechnic Institute.

@rhatdan

Read more about Daniel Walsh, Red Hat, Inc.

T5

Understanding SELinux

Daniel Walsh, Red Hat, Inc.

Daniel Walsh has worked in the computer security field for over 30 years. Dan joined Red Hat in August 2001. Dan leads the RHEL Docker enablement team since August 2013, but has been working on container technology for several years. He has led the SELinux project, concentrating on the application space and policy development. Dan helped developed sVirt, Secure Virtualization. He also created the SELinux Sandbox, the Xguest user and the Secure Kiosk. Previously, Dan worked Netect/Bindview's on Vulnerability Assessment Products and at Digital Equipment Corporation working on the Athena Project, AltaVista Firewall/Tunnel (VPN) Products. Dan has a BA in Mathematics from the College of the Holy Cross and a MS in Computer Science from Worcester Polytechnic Institute.

9:00 am–12:30 pm

Lincoln 4

The tutorial will cover basic SELinux concepts, but more from a usability point of view. What is SELinux trying to tell me? How does SELinux enforcement work? How does SELinux work in virtualization and containers? How can SELinux work in the enterprise?

Who should attend:

System administrators, security professionals, and developers.

Take back to work:

A greater understanding of SELinux; how to turn it on and leave it on. How to deal with SELinux issues.

Topics include:

What is SELinux trying to tell me?
How does SELinux protect my virtualization and container systems?
How do I manage SELinux in the enterprise?

Additional Materials:

Attendees should have a VM that support SELinux, preferably Fedora 22 or Red Hat Enterprise Linux 7.

Back to Top

Half Day Afternoon

Jerome Petazzoni, Docker Inc.

Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

@jpetazzo

Read more about Jerome Petazzoni, Docker Inc.

T6

Advanced Docker Concepts and Container Orchestration

Jerome Petazzoni, Docker Inc.

Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life, he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PaaS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

1:30 pm–5:00 pm

Thurgood Marshall West

You have installed Docker, you know how to run containers, and have written Dockerfiles to build container images for your applications (or parts of your applications). Now comes the next part: connecting multiple containers together and deploying them on a cluster of Docker hosts instead of your single development machine.

This is what we will cover, along with some production-related questions: How to manage logs? Backups? Remote access? Security upgrades?

In this workshop, we will present the networking model of containers and cover how to run containerized applications spanning multiple Docker hosts. We will give recipes to implement cleanly various ops tasks including logging, backups, and more. Finally, we will discuss orchestration and scheduling. We will present solutions like Swarm and Mesos, and see some of them in action on an actual microservices architecture.

Who should attend:

Developers and system administrators who are familiar with basic Docker concepts and want to learn how to deploy Docker for real-world applications.

Take back to work:

You will know how to deploy Docker "for real," and how to perform common operational tasks with containers. You will learn about patterns and best practices to scale applications with Docker and containers.

Topics include:

Pre-requirements
VM environment
Our sample application
Running the whole app on a single node
Finding bottlenecks
Scaling workers on a single node
Scaling HTTP on a single node
Connecting to containers on other hosts
Abstracting connection details
Backups
Logs
Security upgrades
Network traffic analysis
Introducing Swarm
Setting up our Swarm cluster
Running on Swarm
Network plumbing on Swarm
Last words

Additional Materials:

You won't need to pre-install Docker before the course; each student will be given credentials to connect to an individual virtual machine. All you need is a computer with an SSH client.

Back to Top

Thomas A. Limoncelli, Stack Overflow

Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

@yesthattom

Read more about Thomas A. Limoncelli, Stack Overflow

T7

How to Not Get Paged: Managing On-call to Reduce Outages

Thomas A. Limoncelli, Stack Overflow

Tom is an internationally recognized author, speaker, and system administrator. His new book, The Practice of Cloud System Administration, launched last year. His past books include Time Management for System Administrators (O'Reilly) and The Practice of System and Network Administration (Pearson). In 2005, he received the USENIX LISA Outstanding Achievement Award.

He works in New York City at Stack Exchange, home of Careers.Stackoverflow.com, and previously worked at Google and Bell Labs. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

1:30 pm–5:00 pm

Lincoln 4

People think of "on call” as responding to a pager that beeps because of an outage. In this class, you will learn how to run an on-call system that improves uptime and reduces how often you are paged. We will start with a monitoring philosophy that prevent outages. Then we will discuss how to construct an on-call schedule—possibly in more detail than you've cared about before—but, as a result, it will be more fair and less stressful. We'll discuss how to conduct “fire drills” and “game day exercises” that create antifragile systems. Lastly, we'll discuss how to conduct a postmortem exercise that promotes better communication and prevents future problems.

Who should attend:

Sysadmins, devs, operations, and their managers

Take back to work:

Knowledge that makes being on call more fair and less stressful
Strategies for using monitoring to improve uptime and reliability
Team-training techniques such as "fire drills" and "game day exercises"
How to conduct better postmortems/learning retrospectives

Topics include:

Why your monitoring strategy is broken and how to fix it
Building a more fair on-call schedule
Monitoring to detect outages vs. monitoring to improve reliability
Alert review strategies
Conducting “fire drills” and “game day exercises”
"Blameless postmortem documents"

Additional Materials:

Back to Top

Monday, February 22, 2016

Half Day Morning

Gary A. Grider, Los Alamos National Laboratory

Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

Read more about Gary A. Grider, Los Alamos National Laboratory

John Bent, EMC

John Bent, currently of EMC, soon to be of Dell, formerly of Los Alamos National Lab, has been working on storage systems for over 20 years. After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary designing, maintaining, and measuring some of the world's largest parallel storage systems. Now at EMC, John works in the Office of the CTO helping design and map EMC storage products to emerging workloads in both Enterprise and Extreme IO.

Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group. John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.

Read more about John Bent, EMC

Mark Gary, Lawrence Livermore National Laboratory

Mark Gary is a Deputy Division Leader for the Livermore Computing Division within Computations. In this role, Mark has responsibilities for the 24x7 operation of LLNL's world-class computing environment. Livermore Computing provides reliable high performance computers, infrastructure and services (networks, data archive, operations, file systems, system software, visualization, system administration, user assistance and consultation) in support of LLNL missions. Mark leads projects ranging from leading integrated LC planning efforts, to external collaborations in support of extreme scale computing and storage futures.

Mark has worked on all aspects of High Performance Computing at Livermore over the last 31 years. While the primary focus of his work has been on mass storage and parallel file systems, Mark has also worked on operating systems, driver, and kernel development. He is a co-author of HPSS and UniTree archival storage systems. Mark has co-managed successful government/industry collaborations over the last three decades and has led archival storage and Lustre file system development and operations teams.

Mark received his B.S. in Computer Science from the University of California, Santa Barbara, in 1984.

Read more about Mark Gary, Lawrence Livermore National Laboratory

Nicholas Lewis, University of Minnesota and Los Alamos National Laboratory

Nicholas Lewis is a Ph.D. candidate in the History of Science, Technology, and Medicine Program at the University of Minnesota, Twin Cities. He received a master's in history from the University of Utah in 2011, and has undergraduate degrees in history and anthropology from Weber State University. He worked in IT before joining the Charles Babbage Institute's NSF History of Computer Security Project as a graduate research assistant. He currently works as a GSRA on the History of Supercomputing Project, a collaborative effort between CBI and the High-Performance Computing Division at Los Alamos National Laboratory, where he is currently conducting dissertation research.

Read more about Nicholas Lewis, University of Minnesota and Los Alamos National Laboratory

M1

Innovations, Challenges, and Lessons Learned in HPC Storage Yesterday, Today, and Tomorrow

Gary A. Grider, Los Alamos National Laboratory

Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

John Bent, EMC

John Bent, currently of EMC, soon to be of Dell, formerly of Los Alamos National Lab, has been working on storage systems for over 20 years. After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary designing, maintaining, and measuring some of the world's largest parallel storage systems. Now at EMC, John works in the Office of the CTO helping design and map EMC storage products to emerging workloads in both Enterprise and Extreme IO.

Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group. John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.

Mark Gary, Lawrence Livermore National Laboratory

Mark Gary is a Deputy Division Leader for the Livermore Computing Division within Computations. In this role, Mark has responsibilities for the 24x7 operation of LLNL's world-class computing environment. Livermore Computing provides reliable high performance computers, infrastructure and services (networks, data archive, operations, file systems, system software, visualization, system administration, user assistance and consultation) in support of LLNL missions. Mark leads projects ranging from leading integrated LC planning efforts, to external collaborations in support of extreme scale computing and storage futures.

Mark has worked on all aspects of High Performance Computing at Livermore over the last 31 years. While the primary focus of his work has been on mass storage and parallel file systems, Mark has also worked on operating systems, driver, and kernel development. He is a co-author of HPSS and UniTree archival storage systems. Mark has co-managed successful government/industry collaborations over the last three decades and has led archival storage and Lustre file system development and operations teams.

Mark received his B.S. in Computer Science from the University of California, Santa Barbara, in 1984.

Nicholas Lewis, University of Minnesota and Los Alamos National Laboratory

Nicholas Lewis is a Ph.D. candidate in the History of Science, Technology, and Medicine Program at the University of Minnesota, Twin Cities. He received a master's in history from the University of Utah in 2011, and has undergraduate degrees in history and anthropology from Weber State University. He worked in IT before joining the Charles Babbage Institute's NSF History of Computer Security Project as a graduate research assistant. He currently works as a GSRA on the History of Supercomputing Project, a collaborative effort between CBI and the High-Performance Computing Division at Los Alamos National Laboratory, where he is currently conducting dissertation research.

9:00 am–12:30 pm

In this tutorial, we will introduce the audience to the lunatic fringe of extreme high-performance computing and its storage systems. The most difficult challenge in HPC storage is caused by millions (soon to be billions) of simultaneously writing threads. Although cloud providers handle workloads of comparable, or larger, aggregate scale, the HPC challenge is unique because the concurrent writers are modifying shared data.

We will begin with a brief history of HPC computing covering the previous few decades, bringing us into the petaflop era which started in 2009. Then we will discuss the unique computational science in HPC so that the audience can understand the unavoidability of its unique storage challenges. We will then move into a discussion of archival storage and the hardware and software technologies needed to store today’s exabytes of data forever. From archive we will move into the parallel file systems of today and will end the lecture portion of the tutorial with a discussion of anticipated HPC storage systems of tomorrow. Of particular focus will be namespaces handling concurrent modifications to billions of entries as this is what we believe will be the largest challenge in the exascale era.

The tutorial will end with a free-ranging audience directed panel.

Topics include:

A brief history lesson about the past 30 years of supercomputers
An understanding of what makes HPC computing unique and the entailing storage challenges
An overview of current HPC storage technologies such as burst buffers, parallel file systems, and archival storage
A glimpse into the future of HPC storage technologies for both hardware and software
Insights into unique research opportunities to advance HPC storage

Additional Materials:

Back to Top

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

Read more about Brent Welch, Google

M2

Understanding Large-Scale Storage Systems

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

9:00 am–12:30 pm

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Ceph, Lustre, GPFS, PanFS, HDFS (Hadoop File System), and OpenStack.

Additional Materials:

Back to Top

Half Day Afternoon

Dean Hildebrand, IBM Research—Almaden

Dean Hildebrand is a Research Staff Member and Master Inventor at the IBM Almaden Research Center and a recognized expert in scalable file systems and object stores. He has authored numerous scientific publications, created over 30 patents, and been the technical program chair and sat on the program committee of numerous conferences. Dr. Hildebrand pioneered pNFS, demonstrating the feasibility of providing standard and scalableaccess to any file system. He received a B.Sc. degree in computer science from the University of British Columbia in 1998 and M.S. and Ph.D. degreesin computer science from the University of Michigan in 2003 and 2007, respectively.

Read more about Dean Hildebrand, IBM Research—Almaden

Bill Owen, IBM

Bill Owen is a Senior Engineer with the IBM Spectrum™ Scale development team. He is responsible for the integration of OpenStack with Spectrum Scale, focusing on the Swift object, Cinder block, and Manila file storage components of OpenStack. He has worked in various development roles within IBM for over 15 years. Before joining IBM, Bill developed and deployed grid management systems for electric utilities. Bill holds B.Sc. and M.S. degrees in Electrical Engineering from New Mexico State University.

Read more about Bill Owen, IBM

M3

An Overview of On-Premise File and Object Storage Access Protocols

Dean Hildebrand, IBM Research—Almaden

Dean Hildebrand is a Research Staff Member and Master Inventor at the IBM Almaden Research Center and a recognized expert in scalable file systems and object stores. He has authored numerous scientific publications, created over 30 patents, and been the technical program chair and sat on the program committee of numerous conferences. Dr. Hildebrand pioneered pNFS, demonstrating the feasibility of providing standard and scalableaccess to any file system. He received a B.Sc. degree in computer science from the University of British Columbia in 1998 and M.S. and Ph.D. degreesin computer science from the University of Michigan in 2003 and 2007, respectively.

Bill Owen, IBM

Bill Owen is a Senior Engineer with the IBM Spectrum™ Scale development team. He is responsible for the integration of OpenStack with Spectrum Scale, focusing on the Swift object, Cinder block, and Manila file storage components of OpenStack. He has worked in various development roles within IBM for over 15 years. Before joining IBM, Bill developed and deployed grid management systems for electric utilities. Bill holds B.Sc. and M.S. degrees in Electrical Engineering from New Mexico State University.

1:30 pm–5:00 pm

This tutorial will provide a technical overview of the latest distributed file and object access protocols. The goal is to provide administrators and developers with the knowledge to choose the best data access protocol for their new applications or determine if their existing file-based applications are good candidates for being ported to using an object access protocol.

For decades, distributed file systems such as NFS have been the sole method for applications to work with remote data. The emergence of mobile devices, tablets, and the Internet of Things, combined with the global demand for cloud storage, has given rise to numerous new object storage access protocols. While these new protocols are simpler in many ways, and offer several new features, they also come with their own set of access semantics that may cause problems for applications.

We will cover and contrast NFSv4/v4.1 with both the S3 and Swift object protocols, as well as discuss the challenges of providing both file and object access to a single dataset, including such topics as common identity, ACL, and quota management.

Additional Materials:

Back to Top

Jason Resch, Cleversafe Inc.

Jason Resch has 17 years of professional software engineering experienceand is presently a Software Architect at Cleversafe, Inc.—a company that pioneered applying Erasure Codes to Object Storage. In his nine years at Cleversafe, Jason specialized in developing new algorithms to improve Erasure Code performance and security and techniques for rebuilding Erasure Coded data. He has 133 issued and 310 pending patents as well an numerous technical conference presentations and published journal papers. Jason graduated from Illinois Institute of Technology in 2006 with a B.S. in Computer Science with a specialization in information security and minor in psychology. He recently was awarded an IIT outstanding Young Alumnus Award and is listed in Crain's Chicago Business Tech 50 list (2015).

Read more about Jason Resch, Cleversafe Inc.

W. David Schwaderer

W. David Schwaderer presently consults for Silicon Valley enterprises, many of them specializing in data storage technologies. As a multidisciplinary technologist, he has authored 11 technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. David has presented at IEEE and USENIX conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across greater Silicon Valley. His four innovation Google TechTalks on YouTube have recorded over 40,400 views. David has a Masters Degree in Applied Mathematics from the California Institute of Technology and an MBA from the University of Southern California. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage forVideo-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

Read more about W. David Schwaderer

M4

Erasure Encoding—Practice and Principles

Jason Resch, Cleversafe Inc.

Jason Resch has 17 years of professional software engineering experienceand is presently a Software Architect at Cleversafe, Inc.—a company that pioneered applying Erasure Codes to Object Storage. In his nine years at Cleversafe, Jason specialized in developing new algorithms to improve Erasure Code performance and security and techniques for rebuilding Erasure Coded data. He has 133 issued and 310 pending patents as well an numerous technical conference presentations and published journal papers. Jason graduated from Illinois Institute of Technology in 2006 with a B.S. in Computer Science with a specialization in information security and minor in psychology. He recently was awarded an IIT outstanding Young Alumnus Award and is listed in Crain's Chicago Business Tech 50 list (2015).

W. David Schwaderer

W. David Schwaderer presently consults for Silicon Valley enterprises, many of them specializing in data storage technologies. As a multidisciplinary technologist, he has authored 11 technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. David has presented at IEEE and USENIX conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across greater Silicon Valley. His four innovation Google TechTalks on YouTube have recorded over 40,400 views. David has a Masters Degree in Applied Mathematics from the California Institute of Technology and an MBA from the University of Southern California. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage forVideo-Intensive Worlds", he was accorded the title "Silicon Valley Icon."

1:30 pm–5:00 pm

It's common knowledge that the volume of global data has exploded. Simultaneously, the challenge to store, protect, and access this data securely "at scale" has produced hyperscale hardware and software architectures that continue to subduct traditional enterprise datacenter systems. These new architectures will prove essential inresponding to the unrelenting global "data tsunami".

One important hyperscale data storage methodology is Object Storage. Object Storage often uses Erasure Coding as a means to reduce data loss probabilities while simultaneously economizing data storage capital costs. Erasure Coding's powerful principles are also found in numerous other data retention methodologies, including Information Dispersal Algorithm (IDA) deployments and Secret Sharing, a method of providing shared-data security.

Unfortunately, understanding Erasure Coding's deployment strategies and powerful foundations can quickly prove challenging, if not impossible, because Erasure Coding's simple principles are typically steeped in academic obfuscation. This has historically presented impenetrable obstacles to many engineers. Luckily, that's totally unnecessary.

The first part of this tutorial will provide a brief Object Storage and Erasure Coding introduction as a backdrop for a deep exploration of effective Erasure Coding deployment strategies, including performance and bandwidth tradeoff considerations. It will also introduce IDA and Secret Sharing and briefly discuss their relation to Erasure Coding.

After an intermission, the second part of the tutorial will provide a programming lab which exercises running Python 2.7 programs distributed on the FAST '16 Tutorial Sessions USB thumb drive. This lab should help cement Erasure Code principles and deployment considerations as well as provide demonstrations of their utility. As an example, the programs will illustrate Erasure Code operations using tables as well as on-the-fly calculations—useful in configurations where it is necessary to trade processing cycles for addressable memory.

This tutorial portion will conclude with an intense, but an extremely accessible, Erasure Coding principles discussion that will be of interest for attendees desiring a deeper understanding of how Erasure Codes achieve their results. This material will be devoid of impenetrable mathematical jargon typically prevalent in Erasure Code literature. The discussion progressively examines various Galois Finite Fields in detail, with a brief discussion of GF(2^16).

Finally, the tutorial will include discussion from the forthcoming book titled Exabyte Data Preservation, Postponing the Inevitable, co-authored by the speakers and Dr. Ethan Miller of University of California, Santa Cruz.

Topics include:

Brief Object Storage Introduction
Erasure Coding and Object Storage
Erasure Coding Deployment Strategy and Tradeoff Considerations
Information Dispersal Algorithm and Secret Sharing
Understanding Galois Finite Fields
Galois Finite Field Computations (made extremely accessible)
Python 2.7 Galois Finite Field Computation Demonstration Programs
Python 2.7 programming lab

Additional Materials:

Back to Top

Sunday, December 4, 2016

Full Day

Theodore Ts'o, Google

Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

Read more about Theodore Ts'o, Google

S1

Linux Performance Tuning

Theodore Ts'o, Google

Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

9:00 am–5:00 pm

Constitution Ballroom A

The Linux operating system is commonly used both in the data center and for scientific computing applications; it is used in embedded systems as small as a wristwatch, as well as in large mainframes. As a result, the Linux system has many tuning knobs so that it can be optimized for a wide variety of workloads. Some tuning of the Linux operating system has been done "out of the box" by enterprise-optimized distributions, but there are still many opportunities for a system administrator to improve the performance of his or her workload on a Linux system.

This class will cover the tools that can be used to monitor and analyze a Linux system, and key tuning parameters to optimize Linux for specific server applications, covering the gamut from memory usage to filesystem and storage stacks, networking, and application tuning.

Who should attend:

Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:

The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:

Strategies for performance tuning
Characterizing your workload's requirements
Finding bottlenecks
Tools for measuring system performance
Memory usage tuning
Filesystem and storage tuning
Network tuning
Latency vs. throughput
Capacity planning
Profiling
Memory cache and TLB tuning
Application tuning strategies

Additional Materials:

Back to Top

Tyler Fitch, Chef

Tyler is an Architect in Chef’s Customer Success program, championing successful patterns and delightful experiences in automation to enterprise customers. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

Read more about Tyler Fitch, Chef

S2

Automation Tools Bootcamp

Tyler Fitch, Chef

Tyler is an Architect in Chef’s Customer Success program, championing successful patterns and delightful experiences in automation to enterprise customers. Prior to working at Chef, he spent a decade as an engineer for Adobe, developing and automating commerce services for adobe.com using a variety of technologies. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

9:00 am–5:00 pm

Constitution Ballroom B

Overview
The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new IT automation tools. We will learn about and then use Vagrant, Chef, Packer, Docker, Terraform and Artifactory to deploy a small application in local VMs.

We will cover a progression of tasks, leveraging information from previous sections to deploy a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.

Who should attend:

Operations, QA, those who choose to call themselves DevOps, and even managers can come learn.

Take back to work:

These automation tools are freely available to engineers, enabling them to safely break local environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.

Topics include:

Vagrant, Chef, Packer, Docker, Terraform, and Artifactory

Additional Materials:

Back to Top

Salim Virji, Google

Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and delightful ways to connect them all. He lives and works in New York City.

Read more about Salim Virji, Google

S3

SRE Classroom: Non-Abstract

Salim Virji, Google

Salim Virji is a Site Reliability Engineer at Google. He has worked on infrastructure software, back-end systems, front-end applications, and delightful ways to connect them all. He lives and works in New York City.

9:00 am–5:00 pm

Commonwealth Ballroom

With this hands-on tutorial, you will develop an understanding for designing, building, and running reliable Internet services at a large scale.

Who should attend:

This tutorial is suitable for executives who need to specify and evaluate systems, engineers who build systems, and IT professionals who want to run first-class services built with reliable systems.

Take back to work:

You will take back an understanding of how to evaluate system designs, how to specify and build large systems, and how to operate these systems in the real world in a way that will scale as the system grows.

Topics include:

Designing Reliable Systems
Building Reliable Systems
Running Reliable Systems

Additional Materials:

Back to Top

Half Day Morning

Lee Damon, University of Washington

Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

Read more about Lee Damon, University of Washington

S4

Professional Conduct and Ethics for System Administrators

Lee Damon, University of Washington

Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

9:00 am–12:30 pm

Gardner Room

This introductory tutorial will start by examining some of the ethical responsibilities that come along with access to other users' data, accounts, and confidential information. We will look at several case studies involving both local and cloud usage. All attendees are strongly encouraged to participate in the discussion. Numerous viewpoints will be considered in order to give students a perspective from which to develop their own reasoned response to ethical challenges.

Who should attend:

Anyone who is a system administrator or has access to personal/confidential information, or anyone who manages system administrators or makes policy decisions about computer systems and their users. There are no prerequisites for this class.

Take back to work:

After completing this tutorial you will be better prepared and able to resolve ethically questionable situations and will have the means to support your decisions.

Topics include:

Why it is important to set your ethical standards before it comes up
Who is impacted by "expectations of ethical conduct"
Why this isn't just an expectation of system administrators
Implicit expectations of ethical behavior
Ethics and The Cloud
Coercion to violate ethics
Well-intentioned violations of privacy
Collection, retention, and protection of personal data
Management directives vs. friendships
Software piracy/copying in a company, group, or department

Additional Materials:

Back to Top

Kyrre Begnum, Oslo University College of Applied Sciences

Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BSc levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

Read more about Kyrre Begnum, Oslo University College of Applied Sciences

Nicole Forsgren, DORA

Dr. Nicole Forsgren is an IT impacts expert who shows leaders and practitioners how to unlock the potential of technology change in their organizations. Best known for her work with tech professionals and as the lead investigator on the State of DevOps Reports, she is CEO and Chief Scientist at DORA (DevOps Research and Assessment) and an Academic Partner at Clemson University. In a previous life, she was a professor, sysadmin, and hardware performance analyst.

Read more about Nicole Forsgren, DORA

S6

Statistics for Operations: Making Sense out of Data

Kyrre Begnum, Oslo University College of Applied Sciences

Kyrre Begnum works as an Associate Professor at Oslo and Akershus University College of Applied Sciences where he teaches sysadmin courses at the MSc and BSc levels. Kyrre holds a PhD from the University of Oslo with a focus on understanding the behavior of large systems. He has experience with large scale virtual machine management, cloud architectures and developing sysadmin tools. His research focus is on practical and understandable approaches that bring advanced models to real life scenarios.

Nicole Forsgren, DORA

Dr. Nicole Forsgren is an IT impacts expert who shows leaders and practitioners how to unlock the potential of technology change in their organizations. Best known for her work with tech professionals and as the lead investigator on the State of DevOps Reports, she is CEO and Chief Scientist at DORA (DevOps Research and Assessment) and an Academic Partner at Clemson University. In a previous life, she was a professor, sysadmin, and hardware performance analyst.

9:00 am–12:30 pm

Fairfax Room

This tutorial is a course in statistics with a specific focus on system administrators and the types of data they face. We assume little prior knowledge of statistics and cover the most common concepts in descriptive statistics and apply them to data taken from real life examples. Our aim is to provide insight into what methods provide good interpretation of data such as distributions, probability and formulating basic statements about the properties of observed data.

The first part will cover descriptive statistics for single datasets, including mean, median, mode, range and distributions. When discussing distributions, we will cover probabilities through percentiles (e.g., a normal distribution is very uncommon in ops data). This session will use a prepared dataset and spreadsheet (LibreOffice or OpenOffice, because it works on all platforms). We have data from the number of players from an online game over a 6-month period. In this exercise, we will analyze the distribution and try to make statements like, “What is the likelihood that we see more than 27,000 simultaneous players?” One of the lessons is that the top 5% in the distribution counts for almost a doubling in players, which is interesting. We then extend the discussion to include organizational implications: Imagine if your job is to buy resources for a service like this, and you have to double your rig in order to cope with something that is only 5% likely to happen? How would you explain it in a meeting?

The second part will discuss comparisons using two common methods that can be calculated in a spreadsheet: correlations and regressions. Correlations will be used as a tool to identify interesting relationships among data; ranked correlation may be considered for two data sets that have the same «flow» but on separate ranges (e.g., the correlation between web requests and database requests). Regression can also be used to identify relationships. For example, using a regression plot between two variables, one could identify bottlenecks by comparing the load of two tiers (db tier vs web tier). In a scalable system, we would expect a nice 45-degree linear relationship between the two. However, if the database tier struggles before the web tier, we would see the linear approximation slope «upward» (if the db load is on the y axis) as the load increases.

Throughout we will have a focus on takeaways and trying to couple the different statistical methods with the type of answers they can provide, like: “Can the average of a dataset explain the outer limits of my data?”. It is easy to fall off the wagon with a topic like statistics. We are aware of this risk and will utilize active learning techniques such as socrative and kahoot to engage the audience and make them participate more.

Who should attend:

Sysadmins who are faced with data overload and wish they had some knowledge of how statistics can be used to make more sense of it. We assume little prior knowledge of statistics, but a basic mathematical proficiency is recommended.

Take back to work:

A fundamental understanding of how descriptive statistics can help provide additional insight on the data in the sysadmin world and that will allow for further self-study on statistics.
A basic set of statistical approaches that can be used to identify fundamental properties of the data they see in their own environments, and identify patterns in that data.
Learn how to make accurate and clear statements about their metrics that are valuable to the organization.

Topics include:

Descriptive statistics for single datasets, including: mean, median, mode, range, and distributions
Basic analysis of distributions and probabilities using percentiles typically seen in ops
Interpretation of analyses to include team and business implications
Regression analysis to suggest predictive relationships, with an emphasis on interpretation and implications
Correlation analysis and broad pattern detection (if time allows)

Additional Materials:

Back to Top

Half Day Afternoon

Thomas Limoncelli, StackOverflow.com

Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

Read more about Thomas Limoncelli, StackOverflow.com

S5

How to Not Get Paged: Managing On-Call to Reduce Outages

Thomas Limoncelli, StackOverflow.com

Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

1:30 pm–5:00 pm

Fairfax Room

People think of "on call” as responding to a pager that beeps because of an outage. In this class, you will learn how to run an on-call system that improves uptime and reduces how often you are paged. We will start with a monitoring philosophy that prevent outages. Then we will discuss how to construct an on-call schedule—possibly in more detail than you've cared about before—but, as a result, it will be more fair and less stressful. We'll discuss how to conduct “fire drills” and “game day exercises” that create antifragile systems. Lastly, we'll discuss how to conduct a postmortem exercise that promotes better communication and prevents future problems.

Who should attend:

Managers or Sysadmins with oncall responsibility

Take back to work:

Knowledge that makes being on call more fair and less stressful
Strategies for using monitoring to improve uptime and reliability
Team-training techniques such as "fire drills" and "game day exercises"
How to conduct better postmortems/learning retrospectives

Topics include:

Why your monitoring strategy is broken and how to fix it
Building a more fair on-call schedule
Monitoring to detect outages vs. monitoring to improve reliability
Alert review strategies
Conducting “fire drills” and “game day exercises”
"Blameless postmortem documents"

Additional Materials:

Back to Top

Mike Ciavarella, Coffee Bean Software Pty Ltd

Read more about Mike Ciavarella, Coffee Bean Software Pty Ltd

S7

Documentation Techniques for System Administrators

Mike Ciavarella, Coffee Bean Software Pty Ltd

1:30 pm–5:00 pm

Gardner Room

Sysadmins freely acknowledge how important documentation is to their daily lives, and in the same sentence will loudly complain that they don’t have time to produce documentation. This class is about how to produce effective, useful and timely documentation as part of your normal sysadmin activities. Particular emphasis is placed on documentation as a time-saving tool rather than a workload imposition.

Who should attend:

System administrators of all types and levels who need to produce documentation for the systems they manage, or who want to improve their documentation skills. Documentation can be the difference that turns you from a good sysadmin to a great sysadmin!

Take back to work:

The skills to improve personal and team documentation quality
A solid understanding of how to establish and maintain effective documentation practices

Topics include:

Why system administrators need to document
Documentation as part of your daily workflow
Targeting your audience
Common mistakes made in documentation
Tools to assist the documentation process (including effective use of wikis)

Additional Materials:

Back to Top

Monday, December 5, 2016

Half Day Morning

Courtney Eckhardt, Heroku

Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don Norman into a wide-reaching and humane concept of operational reliability.

Read more about Courtney Eckhardt, Heroku

Lex Neva, Heroku

Lex Neva is probably not a super-villain. He has six years of experience keeping large services running, including Linden Lab's Second Life, DeviantArt.com, and his current position as a Heroku SRE. While originally trained in computer science, he’s found that he most enjoys applying his software engineering skills to operations. A veteran of many large incidents, he has strong opinions on incident response, on-call sustainability, and reliable infrastructure design, and he currently runs SRE Weekly (sreweekly.com).

Read more about Lex Neva, Heroku

M1

"I Never Want to Live through This Again!": Running Excellent Retrospectives

Courtney Eckhardt, Heroku

Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don Norman into a wide-reaching and humane concept of operational reliability.

Lex Neva, Heroku

Lex Neva is probably not a super-villain. He has six years of experience keeping large services running, including Linden Lab's Second Life, DeviantArt.com, and his current position as a Heroku SRE. While originally trained in computer science, he’s found that he most enjoys applying his software engineering skills to operations. A veteran of many large incidents, he has strong opinions on incident response, on-call sustainability, and reliable infrastructure design, and he currently runs SRE Weekly (sreweekly.com).

9:00 am–12:30 pm

Fairfax Room

Your site’s back up, you’re back in business. Do you have a way to make sure that problem doesn’t happen again? And if you do, do you like how it works?

Heroku uses a blameless retrospective process to understand and learn from our operational incidents. We’ve recently released the templates and documentation we use in this process, but experience has taught us that facilitating a retrospective is a skill that’s best taught person to person.

This tutorial will take you through a retrospective based on the internal and external communications of a real Heroku operational incident. We’ve designed it to help you experience first-hand the relaxed, collaborative space that we achieve in our best retrospectives. We’ll practice tactics like active listening, redirecting blame, and reframing conversations. Along the way, we’ll discuss how we developed this process, what issues we were trying to solve, and how we’re still iterating on it.

Who should attend:

Managers, tech leads, and anyone interested in retrospective culture and iterating on processes.

Take back to work:

Attendees will have the materials and first-hand experience to advocate for (or to begin) an incident retrospective process at their workplace, or to improve a process they might already be using.

Topics include:

Why run a retrospective
Goal of a retrospective
Blameless retrospectives
Facilitating: redirecting blame, reframing, drawing people out
How to structure a retrospective
Preparing for a retrospective
Five “why”s/infinite “how”s
Human error
Achieving follow-through on remediation items

Additional Materials:

Back to Top

Ben Breard, Red Hat

Ben Breard is the Technology Product Manager for Linux Containers at Red Hat where he focuses on driving the container roadmap, RHEL Atomic Host, and evangelizing open source technology in his free time. Previously he was a Solutions Architect and and worked closely with key customers around cloud/systems management, virtualization, and all things RHEL. Ben joined Red Hat in 2010 and currently works out of Dallas, Texas.

Read more about Ben Breard, Red Hat

M4

Demystifying Systemd

Ben Breard, Red Hat

Ben Breard is the Technology Product Manager for Linux Containers at Red Hat where he focuses on driving the container roadmap, RHEL Atomic Host, and evangelizing open source technology in his free time. Previously he was a Solutions Architect and and worked closely with key customers around cloud/systems management, virtualization, and all things RHEL. Ben joined Red Hat in 2010 and currently works out of Dallas, Texas.

9:00 am–12:30 pm

Constitution Ballroom B

It's 2016 and at this point why would anyone care about an init system? Well, apparently not only is process management essential to the operating system, all the hype around things like containers and resource management are making this topic sexy. This session will be a hands-on, interactive look at the architecture, capabilities, and administrative how-tos of systemd. Anyone who's new to systemd or looking to dig deeper into some of the advanced features should attend. Please bring a laptop with a virtual machine running a distribution of your choice that uses systemd.

Who should attend:

Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.

Take back to work:

Understanding of how systemd works, where to find the configuration files, and how to maintain them.

Topics include:

The basic principles of systemd
systemd's major components
Anatomy of a systemd unit file
Understanding and optimizing the boot sequence
Improved system logging with the journal
Resource management via systemd's cgroups interface
Simple security management with systemd and the kernel's capabilities
systemd, containers, and virtualization

Additional Materials:

Back to Top

John H. Nyhuis

John H. Nyhuis is an Infrastructure Engineer, serving as IT Director at the Altius Institute for Biomedical Sciences. He is the culmination of 20 years of experience in Infrastructure Engineering and IT Management within industry, academic, and medical environments, including extensive experience with scalable system architecture, implementation, optimization, and deployment:

Leadership: Experienced building consensus in diverse highly environments. Project Management (Scrum and LEAN), Risk Management, IT audits and remediation, HIPAA, FIRPA

Management: Expense Controls, Budgeting, Employees Management, Project Proposals, Process Improvement

Vendor Relations: Contract Negotiation, Fundraising / Equipment Donations

Architecture/Design: Cloud Computing, Virtualization, Automation, Scalability, Root Cause Analysis

Deployments: Massively Parallel Implementations, Global Deployments, Code Management, Release Testing

In his free time, John serves as an Economic Development Commissioner for the City of Lake Forest Park, in the great state of Washington.

Read more about John H. Nyhuis

Lee Damon, University of Washington

Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

Read more about Lee Damon, University of Washington

M6

Advanced Communication: Practical Tactics and Strategy

John H. Nyhuis
John H. Nyhuis is an Infrastructure Engineer, serving as IT Director at the Altius Institute for Biomedical Sciences. He is the culmination of 20 years of experience in Infrastructure Engineering and IT Management within industry, academic, and medical environments, including extensive experience with scalable system architecture, implementation, optimization, and deployment:
Leadership: Experienced building consensus in diverse highly environments. Project Management (Scrum and LEAN), Risk Management, IT audits and remediation, HIPAA, FIRPA Management: Expense Controls, Budgeting, Employees Management, Project Proposals, Process Improvement Vendor Relations: Contract Negotiation, Fundraising / Equipment Donations Architecture/Design: Cloud Computing, Virtualization, Automation, Scalability, Root Cause Analysis Deployments: Massively Parallel Implementations, Global Deployments, Code Management, Release Testing
In his free time, John serves as an Economic Development Commissioner for the City of Lake Forest Park, in the great state of Washington.

Lee Damon, University of Washington
Lee Damon has a B.S. in Speech Communication from Oregon State University. He has been a UNIX system administrator since 1985 and has been active in SAGE (US) & LOPSA since their inceptions. He assisted in developing a mixed AIX/SunOS environment at IBM Watson Research and has developed mixed environments for Gulfstream Aerospace and QUALCOMM. He is currently leading the development effort for the Nikola project at the University of Washington Electrical Engineering department. Among other professional activities, he is a charter member of LOPSA and SAGE and past chair of the SAGE Ethics and Policies working groups. He chaired LISA '04 and co-chaired CasITConf '11, '13, and '14.

9:00 am–12:30 pm

Gardner Room

Systems Administrators are expected to be intelligent, dedicated, and professional experts in our field. Yet when compared to other professions of similar education, we often do not receive credit for our efforts and receive less respect from our fellow workers.

This problem doesn’t just affect our personal well-being; businesses make poorer decisions when input from technical people is disregarded or overlooked. As professionals, we are all expected to step up and defend ourselves, our teams, and our projects. Being able to communicate meaningfully and accurately is critical to our success.

This tutorial will provide practical techniques for both in-person and written interpersonal challenges. Difficult conversations are a part of life as well as business and we need to develop the tools for dealing with them. We will review materials from several sources including our own experiences and will have practical exercises to work through to give attendees a strong starting point for their own difficult communication challenges.

Who should attend:
IT Professionals and anyone who must deal with difficult people under stressful conditions.

Take back to work:

How to deal effectively with verbal and written conflict

How to identify and stop verbal and written abuse

How to maximize your chances to succeed in difficult conversations

Topics include:

E.I.Q. and how to use it

Satir Modes of Conversation

Verbal Jujitsu

Lifescripts

Additional Materials:

Back to Top

Full Day

Caskey L. Dickson, Microsoft Corporation

Caskey L. Dickson is a Site Reliability Engineer at Microsoft where he is part of the leadership team reinventing operations at Azure. Before that he was at Google where he worked as an SRE/SWE, writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. He has worked in online services since 1995 when he turned up his first web server and has been online ever since. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

Read more about Caskey L. Dickson, Microsoft Corporation

M3

Everything You Wanted to Know about Operating Systems but Were Afraid to Ask

Caskey L. Dickson, Microsoft Corporation
Caskey L. Dickson is a Site Reliability Engineer at Microsoft where he is part of the leadership team reinventing operations at Azure. Before that he was at Google where he worked as an SRE/SWE, writing and maintaining monitoring services that operate at "Google scale" as well as business intelligence pipelines. He has worked in online services since 1995 when he turned up his first web server and has been online ever since. Before working at Google, he was a senior developer at Symantec, wrote software for various Internet startups such as CitySearch and CarsDirect, ran a consulting company, and even taught undergraduate and graduate computer science at Loyola Marymount University. He has a B.S. in Computer Science, a Masters in Systems Engineering, and an M.B.A from Loyola Marymount.

9:00 am–5:00 pm

Constitution Ballroom B

Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "it's sometimes slow."

This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis.

You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms, and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.

Buckle up.

Who should attend:

All admins who did not take the Comp-Sci academic route and never had a course in OS internals

Inexperienced admins whose coursework or training didn't include the kind of OS internals that they should (modern OS courses have become a shadow of their former selves and commonly require writing no OS code)

More experienced admins who haven't really had to address these sorts of issues on a regular basis who probably know a lot about some individual aspects but could benefit from everything being put into a broader context

Take back to work:
Attendees will gain a deeper understanding of what goes on inside the kernel and the areas where things can go wrong. We'll explore how little the concept of "system load" captures about the true system state, and attendees will be prepared to improve both their operational response methodologies as well as their monitoring goals.

Topics include:
Morning:

Scheduling and Process Management

Memory Management and the MMU

Virtualization and its impact on these

Afternoon:

File System Architecture (for sysadmins covering ext (2, 3, 4), NTFS, and ZFS)

Storage layer performance, disks, RAID, and SANs

The impact of virtualization on these

Additional Materials:

Back to Top

Nick Feamster, Princeton University

Nick Feamster is a professor in the Computer Science Department at Princeton University and the Acting Director of the Princeton University Center for Information Technology Policy (CITP). Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

Read more about Nick Feamster, Princeton University

M8

Software Defined Networking for Network Administrators

Nick Feamster, Princeton University
Nick Feamster is a professor in the Computer Science Department at Princeton University and the Acting Director of the Princeton University Center for Information Technology Policy (CITP). Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), USENIX Security (circumventing web censorship using Infranet), and USENIX Security (web cookie analysis).

9:00 am–5:00 pm

9:30 am–12:30 pm: Commonwealth Ballroom
1:30 pm–5:00 pm: Back Bay Ballroom D (LISA Lab)

The tutorial will cover topics in Software Defined Networking (SDN) in a presentation format that is oriented towards network and system administrators. SDN separates the network's control plane (the software that controls how networks from its data plane (the routers and switches in the network that forward packets).

This course will cover the aspects of Software Defined Networking that relate most closely to network operations. We will divide the course into four parts:

Overview and motivation of SDN

Commercial operational SDN controllers (Ryu, ODL) and switch capabilities

Network virtualization technologies

Network operations use cases (including SDN for the wide area, data centers, home networks, and wireless)

The after lunch portion of this class will be held in the LISA Lab.

Who should attend:
Beginner and Intermediate Virtual Infrastructure Administrators

Take back to work:
Attendees will take back knowledge about SDN that will help them evaluate whether it is an appropriate technology to apply in their own networks.

Attendees will better understand what SDN is, the types of problems that it can (and cannot) solve, the capabilities of current software controller platforms, and the capabilities (and shortcomings) of existing hardware switches.

The course will also include "war stories" from successful (and stunted) SDN deployments that will help attendees better evaluate the suitability of SDN for solving their own network management problems.

Topics include:

Overview and motivation of SDN

Commercial operational SDN controllers (Ryu, ODL) and switch capabilities

Ryu

Open Daylight

An overview of hardware switch capabilities

Network virtualization technologies

Nicira NSX

FlowVisor

Network operations use cases (including SDN for the wide area, data centers, home networks, and wireless)

SDX: Software Defined Internet Exchange Points

SD-WAN: SDN in Wide Area Networks

SDN in home networks

SDN in data centers

Additional Materials:

Back to Top

Half Day Afternoon

Robert A. Ballance, Ph.D.

Dr. Robert Ballance honed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the Long Now Foundation. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989. He is currently serving as a White House Presidential Innovation Fellow.

Read more about Robert A. Ballance, Ph.D.

M2

An Introduction to R for System Administrators

Robert A. Ballance, Ph.D.
Dr. Robert Ballance honed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the Long Now Foundation. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989. He is currently serving as a White House Presidential Innovation Fellow.

1:30 pm–5:00 pm

Commonwealth Room

Data analysis is not just about discovery, it’s about communication. Good communication tells stories. Savvy system administrators provide their management with the background needed to maintain operations, manage budgets, support users, and provide their coworkers with the insights needed to keep their systems solid.

The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for automating the process with reproducible and repeatable results. This brief introduction to R and its ecosystem will provide a walk along the mainline—coming up to speed on R, accessing data, analyzing data, and getting the message out.

This tutorial is designed to:

motivate you to pick up R

demonstrate useful techniques using R

illustrate ways to simplify your life by automating data analysis and reporting

In-class demonstrations will be augmented with hands-on opportunities during the workshop. Additional exercises and data sets that students can explore following the workshop will be provided. If you plan on working on the exercises, install R and (optionally) R Studio.

Who should attend:
System administrators who are awash in operational data and want to do a more efficient job of understanding their data and communicating their findings. Facility with programming and knowledge of basic descriptive statistics is assumed. Prior knowledge of R is not required.

Take back to work:

Acquaintance with R, R packages, and R Studio

Understanding where R fits into the system administrator’s tool set

Familiarity with basic R data-manipulation techniques

Motivation to learn or improve your R skills

Next steps in learning and mastering R

Topics include:

Introduction to the R ecosystem

R as a language

Basic programming in R

The data analysis workflow

Reading and writing data from files and pipes

Data frames and data frame manipulations

Exploratory analysis

Using the ggplot2 package for graphing

Other useful R packages.

Examples will be based on situations encountered during routine system operations.

Additional Materials:

Install R from CRAN (http://cran.r-project.org/) or as supported by your OS package manager

(Optional) Install R Studio from http://www.rstudio.com/

Back to Top

Thomas Limoncelli, StackOverflow.com

Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

Read more about Thomas Limoncelli, StackOverflow.com

M7

Personal Time Management: The Basics for Sysadmins That Are Overloaded

Thomas Limoncelli, StackOverflow.com
Tom is an internationally recognized author, speaker, system administrator, and DevOps advocate. His latest book, the 3rd edition of The Practice of System and Network Administration, launched last month. He is also known for The Practice of Cloud System Administration, and Time Management for System Administrators (O'Reilly). He works in New York City at StackOverflow.com. He's previously worked at Google, Bell Labs/Lucent, AT&T, and others. His blog is and he tweets @YesThatTom. He lives in New Jersey.

1:30 pm–5:00 pm

Fairfax Room

Whether you are a sysadmin, dev, or web ops, time management can be more difficult than any technology issue. This class is for new and junior system admins that have found themselves over their head, overloaded, and looking for a better way to survive the tech world.

This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.

Who should attend:
Sysadmins, devs, operations, and their managers

Take back to work:
By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).

Topics include:

Why typical “time management” strategies don’t work for sysadmins

What makes “to-do” lists fail, and how to make them work

How to eliminate “I forgot” from your vocabulary

How to manage interruptions: preventing them, managing the ones you get

Delegating to coworkers without them knowing

Achieving perfect follow-through

The Cycle System for recording and processing to-do lists

Prioritization techniques

Task grouping: batching, sharding, and multitasking

Handling situations like a big outage disrupting your perfectly planned day

Additional Materials:

Back to Top

Deji Akomolafe, Microsoft Applications Virtualization Lead, VMware

Deji Akomolafe (a CTO Ambassador and Staff Solutions Architect within VMware's Global Field and Partner Readiness Group) specializes in the virtualization of Microsoft Business Critical Applications on the VMware's vSphere platform. Deji is a regular speaker at many industry-leading technical conferences and workshops (including VMworld, SQL Saturday, EMCWorld, and Partners Exchange), presenting technical subject matters related to virtualization and providing technical guidance to help clients enhance their expertise and ability to optimally virtualize and operate their critical applications.

Read more about Deji Akomolafe, Microsoft Applications Virtualization Lead, VMware

Cody Chapman, Heraflux Technologies

Cody Chapman is a Solutions Architect with Heraflux Technologies. His areas of expertise are virtualization, cloud, storage, performance, datacenter architecture, risk mitigation through high availability and disaster recovery, and performing technical exorcisms. He has worked on systems large and small in a wide variety of industries. He is actively working to automate every facet of datacenter and database management. You can read his blog at heraflux.com, and reach him on Twitter at @codyrchapman.

Read more about Cody Chapman, Heraflux Technologies

David Klee, Heraflux Technologies

David Klee is a Microsoft MVP and VMware vExpert with over seventeen years of IT experience. David is the Founder of Heraflux Technologies, a consultancy focused on data virtualization and performance tuning, datacenter architecture, and business process improvements. You can read his blog at davidklee.net and reach him on Twitter at @kleegeek.

Read more about David Klee, Heraflux Technologies

M5

Designing Your VMware Virtual Infrastructure for Optimal Performance, Resilience, and Availability—Straight from the Source

Deji Akomolafe, Microsoft Applications Virtualization Lead, VMware
Deji Akomolafe (a CTO Ambassador and Staff Solutions Architect within VMware's Global Field and Partner Readiness Group) specializes in the virtualization of Microsoft Business Critical Applications on the VMware's vSphere platform. Deji is a regular speaker at many industry-leading technical conferences and workshops (including VMworld, SQL Saturday, EMCWorld, and Partners Exchange), presenting technical subject matters related to virtualization and providing technical guidance to help clients enhance their expertise and ability to optimally virtualize and operate their critical applications.

Cody Chapman, Heraflux Technologies
Cody Chapman is a Solutions Architect with Heraflux Technologies. His areas of expertise are virtualization, cloud, storage, performance, datacenter architecture, risk mitigation through high availability and disaster recovery, and performing technical exorcisms. He has worked on systems large and small in a wide variety of industries. He is actively working to automate every facet of datacenter and database management. You can read his blog at heraflux.com, and reach him on Twitter at @codyrchapman.

David Klee, Heraflux Technologies
David Klee is a Microsoft MVP and VMware vExpert with over seventeen years of IT experience. David is the Founder of Heraflux Technologies, a consultancy focused on data virtualization and performance tuning, datacenter architecture, and business process improvements. You can read his blog at davidklee.net and reach him on Twitter at @kleegeek.

1:30 pm–5:00 pm

Constitution Ballroom A

Downloads:
M5 Slides (PDF) | M5 Slides (PowerPoint)

Not very long ago, the very idea of virtualization production mission-supporting enterprise applications can be said to be so career-threatening that only the brave dare entertain it for longer than a few seconds. Fast forward to now, virtualization is so pervasive and well-accepted that the inverse is true—"Virtualize First" is now a standard corporate mandate in large enterprises, and no modern commercial application is exempted.

Sadly, embracing virtualization has turned out not be the panacea for everything that ails an enterprise. In fact, virtualization often contributes to sub-optimal performance, availability, recoverability, and agility of many applications in the enterprise—with lots of frustrations, heartburn, reduced productivity, and yes, interrupted personal lives. In a rush to be a part of the "cool crowd," many enterprises fail to identify and account for the intricacies and requirements of the virtualization platform, relegating such considerations to the secondary or tertiary tiers on the "Due Diligence" scale.

If you have have adopted virtualization as a platform for your mission-critical applications, or if you are in the process of doing so, please be sure to attend this tutorial. The tutorial will provide a comprehensive and detailed knowledge transfer that enables you to avoid the common pitfalls encountered in a VMware vSphere virtualization project infrastructure. We will discuss and explain the considerations for successfully running your mission-critical applications in a vSphere-based infrastructure without loss of performance, availability, recovery, and resilience. The tutorial will go beyond the standard slide-ware and present an actual demonstration of the effects of certain configuration optimization strategies on the overall condition of the virtualized applications and the virtual infrastructure as a whole.

Who should attend:

Infrastructure, Solution and Enterprise Architects

Virtual Infrastructure and Applications Administrators

Network Administrators

IT Operators

Take back to work:
The tutorial will be interactive, encouraging questions from participants—so please come in with your own unique and specific questions. The tutorial will provide you with tips and tricks drawn directly from the most current VMware guidance, recommendations, and knowledge-based references, as well as from real-life customer situations.

Topics include:

Virtualization concepts

Virtualization stack

Hardware abstraction and the relationship and inter-dependencies between the physical and virtual components

Pooling and sharing resources in a virtual environment

Common assumptions that lead to performance degradation for virtualized applications

Configuration optimization that enhances performance

Availability and resilience within a VMware vSphere virtual infrastructure

Additional Materials:

Back to Top

Tuesday, December 6, 2016

Full Day

Jérôme Petazzoni, Docker Inc.

Jerome works at Docker, where he helps others to containerize all the things. He was part of the team that built, scaled, and operated the dotCloud PAAS, before it became Docker. When annoyed he threatens to replace things with a very small shell script.

Read more about Jérôme Petazzoni, Docker Inc.

AJ Bowen, Convox

Read more about AJ Bowen, Convox

T1

Deploying and Scaling Applications with Docker Swarm

Jérôme Petazzoni, Docker Inc.
Jerome works at Docker, where he helps others to containerize all the things. He was part of the team that built, scaled, and operated the dotCloud PAAS, before it became Docker. When annoyed he threatens to replace things with a very small shell script.

AJ Bowen, Convox

9:00 am–5:00 pm

Constitution Ballroom A

Docker is an open platform to build, ship, and run any application, anywhere. In this hands-on tutorial, you will learn advanced Docker concepts, and see how to deploy and scale applications using Docker Swarm clustering abilities and other open source tools of the Docker ecosystem.

This tutorial is living material: it is delivered at least once a month in public sessions all around the U.S. and Europe. Since the Docker platform in general, and Docker Swarm in particular, evolve rapidly, this tutorial evolves as well, following closely the releases of the various components of the Docker ecosystem: Engine, Compose, Swarm, Machine.

Who should attend:
Docker users who want production-grade container deployments.

You should be familiar with Docker and basic Docker commands (docker run, docker ps, and docker stop) as well as the Dockerfile syntax (at least RUN, CMD, and EXPOSE commands). Ideally, you should have experimented with Compose. If you have limited Docker knowledge but consider yourself a quick learner, don't hesitate to attend: there will be numerous examples and demos, and you will be able to test them out on your own Docker cluster!

Take back to work:
After this tutorial, you will know how to deploy applications to production with Docker and containers. We will tackle lots of frequently asked questions in the Docker ecosystem: how to manage the lifecycle of container images, how to implement service discovery across Docker clusters, how to load balance traffic on scaled applications, how to perform security upgrades, and more.

Topics include:
Containers, Docker, Orchestration, Scheduling, and Service Discovery

Additional Materials:

Back to Top

Branson Matheson, Cisco Systems, Inc.

Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

Read more about Branson Matheson, Cisco Systems, Inc.

T2

Applied Security Tools for Sysadmins

Branson Matheson, Cisco Systems, Inc.
Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

9:00 am–5:00 pm

Constitution Ballroom B

The course is a direct response to the many requests I have gotten for “more tools”, and so I have written it with an eye to meeting that goal. The class will be taught through a “secure and defend” plan where we will take the majority of class to learn offensive and defensive tools and break into teams and work to secure and setup monitoring for provided on-site test environments. The second phase of our class will involve students coming to LISALabs to use the attack tools and defend their environments from their peers. There would be scheduled times for either the teams independently or in groups to deal with created 'incidents'.

This will be a coordinated event that I will support both in my role as instructor and as a member of LISABuild and Labs, and the second phase will be running throughout the Conference. I will have some form of visual score keeping in Labs where people can walk in and see what's going on with the event. At the end, I will provide prizes and/or accolades for the best teams.

Who should attend:
Participants should be beginning to advanced system administrators of any stripe with an interest in IT Security and a desire to learn how to attack and defend against potential threats in their environments. It is required that participants have experience with *nix command line, basic networking and an understanding of virtual environments.

Take back to work:
Knowledge of how to evaluate an environment, find vulnerabilities and mitigate them, improve security monitoring and the detect and defend attacks. Students will learn how to use a working security toolkit which can be directly applied to students' home environment.

Topics include:

basic security concepts and architectural design

how to scope and scan an environment using readily available tools and general sysadmin knowledge.

how to identify, understand, remediate vulnerabilities, and verify the solution

how to monitor and react to incursions

Additional Materials:

Back to Top

Michael Schloh, Europalab Networks

Michael Schloh von Bennewitz is a computer scientist specializing in network engineering, embedded design, and mobile platform development. Responsible for research, development, and maintenance of packages in several community software repositories, he actively contributes to the Opensource development community.

Michael speaks four languages fluently and presents at technical events every year. He teaches workshops exclusively on Internet of Things and Embedded Computing technology, traveling with a mobile laboratory of over 300 sensors, actuators, and computer devices.

Michael's IoT knowledge profits from years of work at telecoms and relationships with industry leaders. He is a Intel innovator, Samsung partner, and Mozilla committer with the mandate to promote IoT technology.

Additional information is found at http://michael.schloh.com/

Read more about Michael Schloh, Europalab Networks

T7

Interfacing with a Low Power IoT

Michael Schloh, Europalab Networks
Michael Schloh von Bennewitz is a computer scientist specializing in network engineering, embedded design, and mobile platform development. Responsible for research, development, and maintenance of packages in several community software repositories, he actively contributes to the Opensource development community.

Michael speaks four languages fluently and presents at technical events every year. He teaches workshops exclusively on Internet of Things and Embedded Computing technology, traveling with a mobile laboratory of over 300 sensors, actuators, and computer devices.

Michael's IoT knowledge profits from years of work at telecoms and relationships with industry leaders. He is a Intel innovator, Samsung partner, and Mozilla committer with the mandate to promote IoT technology.

Additional information is found at http://michael.schloh.com/

9:00 am–5:00 pm

Back Bay Ballroom D (LISA Lab)

In this hands-on-hardware workshop, we explore the boundaries of traditional systems and where they converge with networks of billions of embedded devices. Starting with the theory of the Internet of Things, related data transports, and common protocols, we create embedded systems using a set of loaned hardware. Focusing on 802.3, 802.11, and Bluetooth Smart transports, we implement our own IoT edge routers serving our own network of sensor and actuator embedded computers. We will implement a simple messaging application using MQTT or AMQP, and round out the training by integrating our piecemeal solutions into a full fledged IoT system.

Who should attend:
Intermediate hardware or network engineers benefit most from this workshop.

Take back to work:
Attendees take to their work a broad understanding of what will power the next generation of embedded devices and how they interface with traditional large Internet systems.

Topics include:
Device classes

Whirlwind tour of hardware

Vendor market trends

Small manufacturing

Transports

Copper 802.3

Wireless 802.11

Bluetooth 1-3

Bluetooth Smart

Zigbee and ANT+

Z-Wave

6LoWPAN

LoRa and SigFox

Protocols

Legacy

MQTT

AMQP

CoAP

ZeroMQ

Additional Materials:

Back to Top

Half Day Morning

Jeanne Schock

Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

Read more about Jeanne Schock

T3

Speed up Your Change Control

Jeanne Schock
Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

9:00 am–12:30 pm

Fairfax Room

Speedy Change Control is not an oxymoron. This tutorial will provide practical, actionable steps to streamline and speed up change control at your organization without increasing risks. In "The Visible Ops Handbook", authors Behr, Kim and Spafford identify a culture of change management as common to high-performing IT groups: “change management does not slow things down in these organizations.” This tutorial will help anyone wishing to implement phase one of the Visible Ops Handbook: “Stabilize The Patient” and “Modify First Response”. While I draw heavily on IT infrastructure Library (ITIL) guidance, much of this is common sense good practice based on lessons learned from past success and failure. No special ticketing system, tools or ITIL knowledge are necessary. I am a certified ITIL Expert with over five years of experience designing, improving and managing a successful change management process at an audited technology company delivering public registry and DNS services running on complex technologies across international data centers.

Who should attend:
Technical people and managers who participate in a change management process, or who would like to build one but are afraid that doing so will slow them down.

Take back to work:

Templates for change request types and procedures

Templates for creating standard operating procedures

ITIL-aligned talking points for making your case for these process improvements

Better understanding of change management and process in general

Topics include:

Different change types

Assessing risks and potential impact

Defining change authorities specific for each change type

Metrics for measuring change process performance against goals

Release and deployment management

Devops

Continuous delivery

Additional Materials:

Back to Top

Brendan Gregg, Netflix

Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He was previously a performance lead and kernel engineer at Sun Microsystems, where he developed the ZFS L2ARC and led performance investigations. He has also created numerous performance analysis tools, which have been included in multiple operating systems. His recent work includes developing methodologies and visualizations for performance analysis.

Read more about Brendan Gregg, Netflix

Sasha Goldshtein, Sela Group

Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Azure MRS, a Pluralsight author, and an international consultant and trainer. Sasha is a book author, a prolific blogger and open source contributor, and author of numerous training courses including .NET Debugging, .NET Performance, Android Application Development, and Modern C++. His consulting work revolves mainly around distributed architecture, production debugging and performance diagnostics, and mobile app development.

Read more about Sasha Goldshtein, Sela Group

T5

Linux Performance Analysis Using Perf and BPF

Brendan Gregg, Netflix
Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He was previously a performance lead and kernel engineer at Sun Microsystems, where he developed the ZFS L2ARC and led performance investigations. He has also created numerous performance analysis tools, which have been included in multiple operating systems. His recent work includes developing methodologies and visualizations for performance analysis.

Sasha Goldshtein, Sela Group
Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Azure MRS, a Pluralsight author, and an international consultant and trainer. Sasha is a book author, a prolific blogger and open source contributor, and author of numerous training courses including .NET Debugging, .NET Performance, Android Application Development, and Modern C++. His consulting work revolves mainly around distributed architecture, production debugging and performance diagnostics, and mobile app development.

9:00 am–12:30 pm

Commonwealth Room

This tutorial will give you experience with two powerful Linux performance analysis tools: perf and BPF. Learn how to profile CPU usage, create flame graphs, trace TCP connections, investigate file system latency, explore software internals, and more.

perf_events, aka "perf" after its front-end, is a Linux mainline tool for profiling and tracing. We will summarize some of its most useful one-liners, and discuss real world challenges and solutions for using it with JIT runtimes (Java, Node.js), and in cloud environments.

Enhanced BPF (Berkeley Packet Filter) is a new in-kernel programmable runtime with a variety of uses, including extending Linux static and dynamic tracing capabilities. We'll primarily focus on the BPF Compiler Collection (bcc) front-end for BPF, which provides a toolkit of many ready-to-run analysis tools, including DTrace classics like execsnoop, opensnoop, and biolatency, and new tools including memleak, trace, and argdist. bcc also provides Python and C interfaces for writing your own powerful dynamic tracing-based tools, and we'll show how that can be done.

We will spend more time exploring the new world of BPF and its features that were made available in the Linux 4.4 release. Enhanced BPF has become a recent hotspot for systems innovation, helping create other new technologies including bcc, kernel connection multiplexer (KCM), and eXpress Data Path (XDP), and is being developed by engineers from many companies, including Facebook, PLUMGrid, Netflix, Cisco, Huawei, Github, SELA, and Intel. Join this workshop to get up to speed with BPF for tracing, try some hands-on labs, and gain real experience with the technology from contributor and performance expert Brendan Gregg.

Topics include:

perf

Enhanced Berkeley Packet Filter (BPF)

BPF Compiler Collection

Python and C interfaces to BPF

Additional Materials:

Back to Top

Half Day Afternoon

Chris McEniry, Sony Interactive Entertainment

Chris "Mac" McEniry is a practicing sysadmin and architect responsible for running a large E-commerce and gaming service. He's been working and developing in an operational capacity for 15+ years. In his free time, he builds tools and thinks about efficiency.

Read more about Chris McEniry, Sony Interactive Entertainment

T4

Go: The Programming Language

Chris McEniry, Sony Interactive Entertainment
Chris "Mac" McEniry is a practicing sysadmin and architect responsible for running a large E-commerce and gaming service. He's been working and developing in an operational capacity for 15+ years. In his free time, he builds tools and thinks about efficiency.

1:30 pm–5:00 pm

Commonwealth Room

Go is a relatively young language that was built with systems programming in mind. It's compact yet powerful grammar aids the swift development of efficient tools for everyday work. Despite its young age, it's already taken a prominent position for system tools. This hands-on tutorial focuses on reading and writing in the Go programming language.

Who should attend:
Anyone with a little bit of programming experience that wants to pick up Go

Take back to work:
The ability to read and write Go

Topics include:

Control Structures

Types

Functions

Goroutines

Channels

Additional Materials:

Back to Top

Jeanne Schock

Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

Read more about Jeanne Schock

T6

Building a Process to Manage the Problems That Cause Incidents

Jeanne Schock
Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About six years ago she transitioned to a role building, managing, and promoting processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience with such processes as Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter, most recently speaking to D.C. chapters of the American Society for Quality, Software Special Interest Group and IEEE Computer Society.

1:30 pm–5:00 pm

Fairfax Room

All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in reoccurrences of the incident.

The goals of the Problem Management Process are to prevent reoccurrence of incidents, prevent problems and resulting incidents from happening, and minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process. I am a certified ITIL Expert. I designed, implemented and then managed a problem process for four years at a registry and DNS service provider with complex technologies across international datacenters.

Who should attend:
Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.

Take back to work:

A step-by-step guide for building and implementing a problem process and the reasons behind each step

A process template with examples that can be easily adapted to fit your organization’s current and future needs

Instructions on setting up a Known Error Database and communicating work arounds with impacted support teams

Guidance for getting buy-in from peers and managers

Topics include:

Incident response vs problem resolution

Root cause analysis techniques

Making decisions that are aligned with business objectives

Getting buy-in from teammates, colleagues and managers

Proactive problem management

After-action reviews as a tool

“Root cause” vs. multiple causes

Additional Materials:

Back to Top

Monday, February 27, 2017

Half Day Morning

Michael Bender, Stony Brook University

Michael A. Bender is a professor of computer science at Stony Brook University. He was Founder and Chief Scientist at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

Bender's research interests span the areas of data structures and algorithms, I/O-efficient computing, scheduling, and parallel computing. He has coauthored over 125 articles on these and other topics. He has won several awards, including an R\&D 100 Award, a Test-of-Time award, two Best Paper Awards, and five awards for graduate and undergraduate teaching.

Bender received his B.A. in Applied Mathematics from Harvard University in 1992 and obtained a D.E.A. in Computer Science from the Ecole Normale Superieure de Lyon, France in 1993. He completed a Ph.D. on Scheduling Algorithms from Harvard University in 1998. He has held Visiting Scientist positions at both MIT and King's College London.

Read more about Michael Bender, Stony Brook University

Martin Farach-Colton, Rutgers University

Martin Farach-Colton is a professor of computer science at Rutgers University. He was Founder and CTO at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

Farach-Colton works on pure and applied algorithms in I/O-efficient storage systems, streaming algorithms and string matching. He has coauthored over 150 articles. He has won several awards, including a Sloan Foundation Fellowship, a Test-of-Time award, a Best Paper award, and teaching awards. He was named a distinguished alum of the University of Maryland Computer Science Department on the occasion of their 40th anniversary.

Farach-Colton received his B.S. in Mathematics and Chemistry from the University of South Carolina in 1984. He received his M.D. from Johns Hopkins in 1988 and his Ph.D. from the University of Maryland in 1991. He has been a Member of Technical Staff at Bell Labs (1997-98) and was an early employee of Google, Inc. (2000-2002).

Read more about Martin Farach-Colton, Rutgers University

Rob Johnson, Stony Brook University

Rob Johnson is a Research Assistant Professor at Stony Brook University and conducts research on Security, Big Data Algorithms, and Cryptography. He is director of the Security, Programming Languages, And Theory (SPLAT) lab at Stony Brook, the Cryptography Lab at the New York Center for Excellence in Wireless and Information Technology (CEWIT), and the Smart Grid Cyber-Security Testing Lab of the New York Advanced Energy Research and Technology Center (AERTC).

He does theoretical work with an impact on the real world. He developed BetrFS, a file system that uses recent advances in data structures to improve performance on some operations by over an order of magnitude. He invented the quotient filter, a high-performance alternative to the Bloom filter for Big Data applications. He founded cache-adaptive analysis, a theoretical framework for designing and analyzing algorithms that dynamically share memory with other processes. He broke the High-bandwidth Digital Content Protection (HDCP) crypto-system used in almost all DVD players and TVs. He co-authored CQual, a static analysis tool that has found dozens of bugs in the Linux kernel and has been used to audit the entire Debian Linux distribution for format-string bugs.

Rob completed his Ph.D. at UC Berkeley in 2006.

Read more about Rob Johnson, Stony Brook University

M1

Write Optimization in Storage Systems

Michael Bender, Stony Brook University
Michael A. Bender is a professor of computer science at Stony Brook University. He was Founder and Chief Scientist at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

Bender's research interests span the areas of data structures and algorithms, I/O-efficient computing, scheduling, and parallel computing. He has coauthored over 125 articles on these and other topics. He has won several awards, including an R\&D 100 Award, a Test-of-Time award, two Best Paper Awards, and five awards for graduate and undergraduate teaching.

Bender received his B.A. in Applied Mathematics from Harvard University in 1992 and obtained a D.E.A. in Computer Science from the Ecole Normale Superieure de Lyon, France in 1993. He completed a Ph.D. on Scheduling Algorithms from Harvard University in 1998. He has held Visiting Scientist positions at both MIT and King's College London.

Martin Farach-Colton, Rutgers University
Martin Farach-Colton is a professor of computer science at Rutgers University. He was Founder and CTO at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015.

Farach-Colton works on pure and applied algorithms in I/O-efficient storage systems, streaming algorithms and string matching. He has coauthored over 150 articles. He has won several awards, including a Sloan Foundation Fellowship, a Test-of-Time award, a Best Paper award, and teaching awards. He was named a distinguished alum of the University of Maryland Computer Science Department on the occasion of their 40th anniversary.

Farach-Colton received his B.S. in Mathematics and Chemistry from the University of South Carolina in 1984. He received his M.D. from Johns Hopkins in 1988 and his Ph.D. from the University of Maryland in 1991. He has been a Member of Technical Staff at Bell Labs (1997-98) and was an early employee of Google, Inc. (2000-2002).

Rob Johnson, Stony Brook University
Rob Johnson is a Research Assistant Professor at Stony Brook University and conducts research on Security, Big Data Algorithms, and Cryptography. He is director of the Security, Programming Languages, And Theory (SPLAT) lab at Stony Brook, the Cryptography Lab at the New York Center for Excellence in Wireless and Information Technology (CEWIT), and the Smart Grid Cyber-Security Testing Lab of the New York Advanced Energy Research and Technology Center (AERTC).

He does theoretical work with an impact on the real world. He developed BetrFS, a file system that uses recent advances in data structures to improve performance on some operations by over an order of magnitude. He invented the quotient filter, a high-performance alternative to the Bloom filter for Big Data applications. He founded cache-adaptive analysis, a theoretical framework for designing and analyzing algorithms that dynamically share memory with other processes. He broke the High-bandwidth Digital Content Protection (HDCP) crypto-system used in almost all DVD players and TVs. He co-authored CQual, a static analysis tool that has found dozens of bugs in the Linux kernel and has been used to audit the entire Debian Linux distribution for format-string bugs.

Rob completed his Ph.D. at UC Berkeley in 2006.

9:00 am–12:30 pm

Write optimization refers to a set of techniques used to improve the performance of databases and file systems. Examples of write optimized data structures include Log-Structured Merge trees (LSMs) and Bε-trees. Systems that use such data structures include BetrFS, HBase, LevelDB, TableFS, TokuMX, TokuDB, etc.

This tutorial reviews write-optimization from the perspective of the analysis and engineering. We provide a framework for understanding which data structure will perform well on which workloads.

Additional Materials:

Back to Top

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

Read more about Brent Welch, Google

M2

Understanding Large-Scale Storage Systems

Updated!

Brent Welch, Google
Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

9:00 am–12:30 pm

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Ceph, Lustre, GPFS, PanFS, HDFS (Hadoop File System), and OpenStack.

Additional Materials:

Back to Top

Half Day Afternoon

Gary A. Grider, Los Alamos National Laboratory

Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

Read more about Gary A. Grider, Los Alamos National Laboratory

John Bent, Seagate Government Solutions

John Bent, Chief Architect for Seagate Government Solutions, has been working on storage systems for over 20 years. After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary Grider at LANL designing, maintaining, and measuring some of the world's largest parallel storage systems. Now at Seagate Gov, John helps design and map Seagate storage products, both hardware and systems, to emerging workloads in both Enterprise and Extreme IO.

Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group. John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.

Read more about John Bent, Seagate Government Solutions

M3

Innovations, Challenges, and Lessons Learned in HPC Storage Yesterday, Today, and Tomorrow

Gary A. Grider, Los Alamos National Laboratory
Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.

Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.

John Bent, Seagate Government Solutions
John Bent, Chief Architect for Seagate Government Solutions, has been working on storage systems for over 20 years. After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary Grider at LANL designing, maintaining, and measuring some of the world's largest parallel storage systems. Now at Seagate Gov, John helps design and map Seagate storage products, both hardware and systems, to emerging workloads in both Enterprise and Extreme IO.

Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group. John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.

1:30 pm–5:00 pm

In this tutorial, we will introduce the audience to the lunatic fringe of extreme high-performance computing and its storage systems. The most difficult challenge in HPC storage is caused by millions (soon to be billions) of simultaneously writing threads. Although cloud providers handle workloads of comparable, or larger, aggregate scale, the HPC challenge is unique because the concurrent writers are modifying shared data.

We will begin with a brief history of HPC computing covering the previous few decades, bringing us into the petaflop era which started in 2009. Then we will discuss the unique computational science in HPC so that the audience can understand the unavoidability of its unique storage challenges. We will then move into a discussion of archival storage and the hardware and software technologies needed to store today’s exabytes of data forever. From archive we will move into the parallel file systems of today and will end the lecture portion of the tutorial with a discussion of anticipated HPC storage systems of tomorrow. Of particular focus will be namespaces handling concurrent modifications to billions of entries as this is what we believe will be the largest challenge in the exascale era.

The tutorial will end with a free-ranging audience directed panel.

Topics include:

A brief history lesson about the past 30 years of supercomputers

An understanding of what makes HPC computing unique and the entailing storage challenges

An overview of current HPC storage technologies such as burst buffers, parallel file systems, and archival storage

A glimpse into the future of HPC storage technologies for both hardware and software

Insights into unique research opportunities to advance HPC storage

Additional Materials:

Back to Top

Andy Rudoff, Data Center Group, Intel Corporation

Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

Read more about Andy Rudoff, Data Center Group, Intel Corporation

M4

Persistent Memory Programming: Challenges and Solutions in Multiple Languages

Andy Rudoff, Data Center Group, Intel Corporation
Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

1:30 pm–5:00 pm

Both Windows and Linux now contain support for Persistent Memory, an emerging non-volatile memory (NVM) technology. Persistent Memory is available today in the form of NVDIMMs and is expected to explode in capacity in the near future. Unlike other NVM technologies, such as SSDs, Persistent Memory provides a byte-addressable programming model, allowing direct memory access like DRAM, but retaining its contents across power loss. Technologies such as Intel’s 3D XPoint are expected to provide terabytes of NVM per CPU socket, with performance near DRAM speeds. The result offers applications a new tier for data placement in addition to the traditional memory and storage tiers: the persistent memory tier. While there are numerous ways for an OS to leverage Persistent Memory in a way that is transparent to the application, converting an application to be "persistent memory aware" will allow the highest performance benefit.

This tutorial will start with the basic SNIA NVM Programming Model used by operating systems to expose Persistent Memory to applications. We will walk through code examples showing how applications get access to Persistent Memory and we will pay special attention to safe programming practices such as flushing to persistence, atomic operations, and writing power-fail safe code. We will look at CPU instructions designed for atomic operations, cache flushing, and fencing, and how they interact with Persistent Memory.

Next, the tutorial will provide a brief survey of available libraries, compilers, and research in this area. We will then walk through some more complex examples of persistent memory programming in C, C++, and Java. Using the open source NVM Libraries from http://pmem.io we will show how to solve the common programming pain points and how the higher-level languages can help avoid common persistent memory programming mistakes.

Topics include:

The SNIA NVM Programming Model

How the Intel Architecture Supports Persistent Memory

The Challenges of Persistent Memory Programming

The Current State of the Persistent Memory Ecosystem

Programming Using the NVM Libraries from http://pmem.io

C, C++, and Java Persistent Memory Programming Techniques

Additional Materials:

Back to Top

Sunday, October 29, 2017

Half Day Morning

Ryan Jarvinen, Red Hat

Ryan Jarvinen is an Open Source Advocate at CoreOS, focusing on improving developer experience in the container community. He lives in Oakland, California and is passionate about open source, open standards, open government, and digital rights. You can reach him as ryanj on twitter, github, and IRC.

Read more about Ryan Jarvinen, Red Hat

S1

Hands-On Intro to Kubernetes

Ryan Jarvinen, Red Hat
Ryan Jarvinen is an Open Source Advocate at CoreOS, focusing on improving developer experience in the container community. He lives in Oakland, California and is passionate about open source, open standards, open government, and digital rights. You can reach him as ryanj on twitter, github, and IRC.

9:00 am–12:30 pm

Bayview Room

Kick off your journey to becoming a DevOps master by learning Kubernetes from the ground up. Get started with an introduction to distributed systems and the architecture behind Kubernetes; then learn about Kubernetes APIs and API object primitives. By the end of this workshop you’ll be deploying, scaling, and automating container-based solutions using open source tools for distributed computing.

Slides: http://bit.ly/lisa17-k8s. Bring a laptop with the following materials: http://bit.ly/lisa17-k8s#/workshop-setup.

Who should attend:
For developers, systems administrators, "DevOps" folks, architects, and those who are interested in learning about distributed systems via hands-on examples. Attendees should have some basic knowledge of Linux Containers (docker) and have an interest in using distributed architectures to develop web solutions.

Take back to work:
Attendees will learn how to deploy, scale, update, and manage container-based solutions through hands-on examples and exercises

Topics include:
Kubernetes, Distributed computing and solutions delivery, SRE, container operations

Additional Materials:

Back to Top

Mark Lamourine, Red Hat

Mark Lamourine fell into system administration when the VAX shop he worked in as a student inherited a set of HP/UX boxes. He became the de-facto admin because he was the only one in the group who had read a man(8) page. Since then he's done stints as a developer, a QA engineer, a lab infrastructure manager and infrastructure admin at a now-defunct world-wide ISP. These days he plays the Sysadmin Advocate to software developers who think software is done when they've installed it once in Vagrant.

When not computer geeking Mark geeks on road bicycles. He's been riding road fixed-gear for fun since before that was a thing.
@markllama

Read more about Mark Lamourine, Red Hat

S4

Underpinnings—Container Hosts: CoreOS and Project Atomic

Mark Lamourine, Red Hat
Mark Lamourine fell into system administration when the VAX shop he worked in as a student inherited a set of HP/UX boxes. He became the de-facto admin because he was the only one in the group who had read a man(8) page. Since then he's done stints as a developer, a QA engineer, a lab infrastructure manager and infrastructure admin at a now-defunct world-wide ISP. These days he plays the Sysadmin Advocate to software developers who think software is done when they've installed it once in Vagrant.

When not computer geeking Mark geeks on road bicycles. He's been riding road fixed-gear for fun since before that was a thing.

9:00 am–12:30 pm

Marina Room

In this talk I outline the characteristics that define a "container host", an OS tuned to run software in containers. Explore the benefits and peculiarities of a stripped down, light weight minimal OS image and the implications for CM and update strategies.

Then I explore the architecture of two common container hosts, CoreOS and Project Atomic. Each has characteristics that make it suitable for different environments. Users will install one of the two environments and follow along probing and observing how a container host differs in operation from a conventional package based host.

Finally I will look at how a sysadmin's day to day tasks and operations will differ when running infrastructure services and providing application runtime environments for developers and users on container hosts. We will establish base network services (DNS, NTP, Authentication) on container hosts as well as installing and demonstrating utility containers to provide standard admin tools that are stripped from light-weight hosts.

Who should attend:
Sysadmins and service designers interested in learning to use container hosts to reduce host management.

Participation requires access to local or cloud VM service.

Take back to work:
Attendees will understand the goals and basic design requirements for container hosts. They will get an overview of the design of both CoreOS and Atomic host, highlighting the differences in architecture and how these inform the choice of container host for an installation.

They will learn how to boot and integrate container hosts into their existing infrastructure. They will know how to install and use traditional host tools from containers and how to manage, update and customize container hosts.

They will create a sample cluster of either CoreOS or Atomic hosts in a demo environment.

Topics include:

Container Hosts

Large Scale Container Infrastructure

Atomic Host and CoreOS architecture

Additional Materials:

Back to Top

George Beech, Stack Exchange

George has been an SRE generalist at Stack Exchange for Since October, 2011. Before that he worked for a Multinational CRM company running their IVR infrastructure. He has worked on every part of the stack from Windows, to Linux, to the network infrastructure. He is currently serving his first term as a LOPSA Director. His experience working in the IT field over more than a decade has led him to love working with multiple technologies, and allowed him to experience everything from running a small network as a consultant to being part of a large team running very large scale infrastructure.

In the past he has spoken at LISA, Velocity NYC, Local user groups, and LOPSA-EAST. As well as writing about his experience working on a high volume web infrastructure on his personal blog as well as the Server Fault blog.

Read more about George Beech, Stack Exchange

S5

Automating Azure with Resource Manager Templates

George Beech, Stack Exchange
George has been an SRE generalist at Stack Exchange for Since October, 2011. Before that he worked for a Multinational CRM company running their IVR infrastructure. He has worked on every part of the stack from Windows, to Linux, to the network infrastructure. He is currently serving his first term as a LOPSA Director. His experience working in the IT field over more than a decade has led him to love working with multiple technologies, and allowed him to experience everything from running a small network as a consultant to being part of a large team running very large scale infrastructure.

In the past he has spoken at LISA, Velocity NYC, Local user groups, and LOPSA-EAST. As well as writing about his experience working on a high volume web infrastructure on his personal blog as well as the Server Fault blog.

9:00 am–12:30 pm

Garden Room

This class will teach administrators how to get a project up and running with Azure Resource manager templates. These templates are an easy way to define, manage, and deploy instances into the Azure cloud using this technology. Additionally, I will go over some basic best practices for making your template more manageable.

Who should attend:
System Administrators who are new to Azure, or have not worked with Resource Manager Templates in the past. Anyone interested in streamlining and automating his or her workflow in the Azure cloud

Take back to work:
Attendees will take back to work the basic skills to get started automating their Azure deployment. They will have the basic understanding and baseline knowledge to work with ARM templates.

Topics include:

Azure Resource Manager

Azure Powershell/Azure CLI

Basics of the ARM Template layout

Metadata

Parameters

Variables

Template file

Using Parameters, and variables to generalize your deployment

Adjusting resource sizing on the fly

Deploying Resources

Base resource

Sizing

Monitoring Configuration

Resource Dependencies

Troubleshooting Templates

Tips and tricks to help you configure templates

Additional Materials:

Back to Top

Full Day

Caskey Dickson, Microsoft

Read more about Caskey Dickson, Microsoft

S2

Everything an Administrator Wanted to Know about Operating System Internals but Was Afraid to Ask

Caskey Dickson, Microsoft

9:00 am–5:00 pm

Seacliff AB Room

Insufficient knowledge of operating system internals is my most common reason for passing on an interview candidate. Anyone can learn that you run tool X to fix problem Y. But what happens when there is no tool X, or when you can't even accurately pinpoint the root cause of why "It's sometimes slow."

This will be a no-holds-barred, fury-road-paced review of all major parts of modern operating systems with specific emphasis on what's important for system administrators. It will provide just enough of an academic focus to bridge the "whys" so you can make better use of fiddling with the "whats" on a day-to-day basis. As an added bonus, it will prime you for the following day's "Linux Performance Tuning" tutorial with Theodore Ts'o.

You will learn about process management, scheduling, file system architecture and internals, interrupt management, the mysteries of the MMU and TLB, belady's anomaly, page replacement algorithms and hopefully a bit of networking. In a nutshell, we'll cover 16 weeks of college-level material in a few hours.

Buckle up.

Additional Materials:

Back to Top

Tyler Fitch, Adobe

Tyler is a Site Reliability Engineer for the Adobe Stock site—working to automate all the things done to build and release changes to the Stock platforms. He recently finished three years of "post graduate work" in DevOps as an Architect in Chef's Customer Success Program where he helped Chef's largest enterprise customers have delightful experiences in IT Automation. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.
@tfitch

Read more about Tyler Fitch, Adobe

S3

Automation Tools Bootcamp

Tyler Fitch, Adobe
Tyler is a Site Reliability Engineer for the Adobe Stock site—working to automate all the things done to build and release changes to the Stock platforms. He recently finished three years of "post graduate work" in DevOps as an Architect in Chef's Customer Success Program where he helped Chef's largest enterprise customers have delightful experiences in IT Automation. He lives in Vancouver, Washington, and when he’s not programming enjoys lacrosse and using his passport.

9:00 am–5:00 pm

Seacliff C Room

The Automation Tools Bootcamp is a tutorial for individuals looking for exposure to and usage of new IT automation tools. We will learn about and then use Vagrant, Chef, Packer, Docker, Terraform, and Artifactory to deploy a small application in local VMs.

We will cover a progression of tasks, leveraging information from previous sections to deploy a small app that runs identically on your local development machine or on a shared server. Get rid of the “it works for me” mentality when you know your local VM is identical to your co-workers' and your shared environments.

Who should attend:
Operations, QA, those who choose to call themselves DevOps, and even managers can come learn.

Take back to work:
These automation tools are freely available to engineers, enabling them to safely break local environments until the change in configuration has been perfected. Basic exposure to these tools will allow attendees to return to work with new ways to tackle the problems they face daily.

Topics include:
Vagrant, Chef, Packer, Docker, Terraform, and Artifactory

Additional Materials:

Back to Top

Half Day Afternoon

Michal Sekletar, Red Hat

Michal Sekletar joined Red Hat in 2011 and currently works as Senior Software Engineer in the "Plumbers" team. He spends his days working and supporting init systems and other low level user-space components. He holds a Masters degree from Brno University of Technology. His other professional interests include programming languages, algorithms, and UNIX-like (other than Linux) operating systems.

Read more about Michal Sekletar, Red Hat

S6

Service Management with systemd

Michal Sekletar, Red Hat
Michal Sekletar joined Red Hat in 2011 and currently works as Senior Software Engineer in the "Plumbers" team. He spends his days working and supporting init systems and other low level user-space components. He holds a Masters degree from Brno University of Technology. His other professional interests include programming languages, algorithms, and UNIX-like (other than Linux) operating systems.

1:30 pm–5:00 pm

Bayview Room

Tasks like management and maintenance of services that are critical to the business are on the daily TODO list of every system administrator. Also, containers and micro-service based architectures create the reality in which number of services that sysadmin has to manage is ever growing. To successfully manage thousands of services we need smart tools that can help us. In this session, we will look at systemd. Init system and service manager used by all major Linux distributions. Session will be a hands-on, interactive look at the architecture, capabilities, and administrative how-tos of systemd. Anyone who is new to systemd or looking to dig deeper into some of the advanced features should attend. Please bring a laptop with a virtual machine running a distribution of your choice that uses systemd.

Who should attend:
Linux system administrators, package maintainers and developers who are transitioning to systemd, or who are considering doing so.

Take back to work:
Understanding of how systemd works, where to find the configuration files, and how to maintain them.

Topics include:

The basic principles of systemd

systemd's major components

Anatomy of a systemd unit file

Understanding and optimizing the boot sequence

Improved system logging with the journal

Resource management via systemd's cgroups interface

Simple security management with systemd and the kernel's capabilities

systemd, containers, and virtualization

Additional Materials:

Back to Top

Nicole Forsgren, Devops Research

Read more about Nicole Forsgren, Devops Research

S7

One Metrics Framework to Rule Them All

Nicole Forsgren, Devops Research

1:30 pm–5:00 pm

Garden Room

Have you ever wondered how to find the “one metric that matters” (for your team)? Or how to magically communicate why your team is doing what you’re doing so everyone can understand? Or moving back several steps -- how should you decide which work to focus on? This tutorial isn’t the magic pill, but it’s the closest thing to get you to be able to answer all of those questions. And once you learn it, you’ll be able to sketch it out on the back of a napkin.

I’ve used this simple framework with:

Fortune 500 executives decide on the right metrics to use for their latest initiatives and communicate it throughout the organization

Sysadmins to communicate their latest improvement work across their own teams and to “the business”

My own research ranging from complex hardware studies to the State of DevOps Reports

The framework works for all types of measures: system, survey, technical, financial, etc.

Who should attend:
Engineers, managers, anyone needing to plan or understand a system.

Take back to work:
When you leave this tutorial, you’ll be able to:

Communicate your measurement framework in a straightforward manner

Identify key measures for your own improvement work, and share this easily with the data team (whether that’s you or another team)

Chain your measurement frameworks, allowing you to link executive-level initiatives to middle management goals to practitioner workstreams

Topics include:
Metrics

Additional Materials:

Back to Top

Carol Smith, Microsoft

Carol Smith has over 12 years experience with programs, communities, and partnerships. She worked at GitHub managing education partnerships for the Student Developer Pack and at Google managing the Google Summer of Code program. She has a degree in Journalism from California State University, Northridge, and is a cook, cyclist, and horseback rider.
@fossygrl

Read more about Carol Smith, Microsoft

Heidi Waterhouse, Consultant

Heidi Waterhouse is a freelance technical writer, information architect, and active conference speaker. Her experience as an in-demand consultant has given her insight into the interview process across several industry segments and allows her to generate meaningful answers to a wide variety of weird interview questions. In her spare time, she considers the technical writing aspects of sewing patterns.
@wiredferret

Read more about Heidi Waterhouse, Consultant

S8

The Hardest Problem in Tech(nical Interviewing) Is People: The Personal Skills in Interviewing

Carol Smith, Microsoft
Carol Smith has over 12 years experience with programs, communities, and partnerships. She worked at GitHub managing education partnerships for the Student Developer Pack and at Google managing the Google Summer of Code program. She has a degree in Journalism from California State University, Northridge, and is a cook, cyclist, and horseback rider.

Heidi Waterhouse, Consultant
Heidi Waterhouse is a freelance technical writer, information architect, and active conference speaker. Her experience as an in-demand consultant has given her insight into the interview process across several industry segments and allows her to generate meaningful answers to a wide variety of weird interview questions. In her spare time, she considers the technical writing aspects of sewing patterns.

1:30 pm–5:00 pm

Marina Room

Technical interviews can be intimidating, but it’s easier if you have confidence in yourself and your ability to answer complicated questions. The hardest questions are not about sorting algorithms, but how you’ll work in a team, how you’ll resolve conflicts, and what it will be like to manage and work with you. This workshop exists to address the skills and theories of presenting yourself as confident, capable, and coachable.

Who should attend:
We envision the audience for this tutorial to be people interviewing for technical or technical-adjacent roles at technology companies who are early career (2-7 years). It is meant for beginners, but all are welcome if they want to brush up on their interviewing skills.

Take back to work:
The audience will experience hands-on practice, and can expect to learn tactics for preparing for and excelling at interviews. We will provide handouts for participants to use after the workshop and for practice. Participants will learn how to accomplish the checkpoints of a hiring workflow, including: phone screens, phone interviews, in-person interviews, and how to accept or reject an offer. The take-home worksheets will provide types of interview questions, job search rubric, self-evaluation forms, and resources for further research.

Topics include:
Culture, Interviewing, Career, Early Career, Technology Industry

Additional Materials:

Back to Top

Monday, October 30, 2017

Full Day

Branson Matheson, Cisco Systems, Inc.

Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."
@sandinak

Read more about Branson Matheson, Cisco Systems, Inc.

M1

Defending against the Dark Arts

Branson Matheson, Cisco Systems, Inc.
Branson is a 29-year veteran of system architecture, administration, and security. He started as a cryptologist for the US Navy and has since worked on NASA shuttle and aerospace projects, TSA security and monitoring systems, secure mobile communications, and Internet search engines. He has also run his own company while continuing to support many open source projects. Branson speaks to and trains sysadmins and security personnel world wide; and he is currently a senior technical lead for Cisco Cloud Services. Branson has several credentials; and generally likes to spend time responding to the statement "I bet you can't...."

9:00 am–5:00 pm

Bayview Room

Today's threats to the enterprise are manifested in many ways but all share similar traits: highly intelligent, well-funded and determined to gain access. In this class, we will explore the murky world of the black-hats. We will examine your security foot-print as they view it, and discuss ways to minimize it, various vectors for attack, and how to detect and defend. We will spend time talking about current threats, and how they can impact your company, and we will build upon the foundations of good security practice. This class has been updated with current events and topics relative to environment profiling, social engineering and new attack vectors. As with all my classes, this will be accompanied with a pinch of humor and a large dollop of common sense.

Who should attend:
Participants should be beginning to mid-level system administrators of any stripe with an interest in IT Security and a desire to understand their potential adversaries. It is suggested that participants have experience with *nix command line and virtual hosts.

Take back to work:
Tools, tips, tricks, and a working security toolkit which can be implemented to improve monitoring, detection, and defense in your organization. Experience working with (mostly) free security software tools.

Topics include:
Security, Risk Evaluation, Social Engineering

Additional Materials:

Back to Top

Half Day Morning

Jenni Snyder, Yelp

@jcsuperstar

Read more about Jenni Snyder, Yelp

M2

The Accidental DBA

Jenni Snyder, Yelp

9:00 am–12:30 pm

Garden Room

Open source relational databases like MySQL and PostgreSQL power some of the world's largest websites, including Yelp. They can be used out of the box with few adjustments and rarely require a dedicated database administrator for the first few months or even years. This means that System Administrators and Site Reliability Engineers are usually the first to respond to some of the more "interesting" issues that can arise as you scale your databases. This tutorial will cover MySQL, but many of the concepts apply to PostgreSQL and other open source RDBMS's. We'll first go over a broad set of DBA basics to introduce MySQL Database Administration and next cover the InnoDB storage engine, database defense and monitoring. Finally, I'll cover the wide array of online resources, books, open source toolkits, and scripts from MySQL, Percona, and the Open Source community that will make the job easier.

Who should attend:
Sysadmins and SREs of all levels who have an interest or need to learn MySQL or supporting an open source relational database.

Take back to work:
Sysadmins and SREs who join us for this tutorial will come away with a real-world and ready for production understanding of why and how MySQL works the way it does.

Topics include:

MySQL Installation and Configuration

Architecture and Filesystem Layout

InnoDB Tuning and Optimization

Transactions

Replication and Scaling Out

Schema/Query Basics, Indexes, and Query Plans

Deciphering Common Errors

Monitoring

Backup and Restore

Troubleshooting

Online Communities

Open Source Toolkits

Additional Materials:

Back to Top

Robert Ballance, Independent Computer Scientist

Dr. Robert Ballance recently completed a White House Presidential Innovation Fellowship where he applied his skills with R to analyzing and delivering broadband deployment data to communities across the U.S.A. He first developed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the American Association for the Advancement of Science. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989.
@BobBallance

Read more about Robert Ballance, Independent Computer Scientist

M3

Introduction to R for System Administrators

Robert Ballance, Independent Computer Scientist
Dr. Robert Ballance recently completed a White House Presidential Innovation Fellowship where he applied his skills with R to analyzing and delivering broadband deployment data to communities across the U.S.A. He first developed his R-programming skills while managing large-scale High-Performance Computing systems for Sandia National Laboratories. While at Sandia, he developed several R packages used internally for system analysis and reporting. Prior to joining Sandia in 2003, Dr. Ballance managed systems at the University of New Mexico High Performance Computing Center. He has consulted, taught, and developed software, including R packages, PERL applications, C and C++ compilers, programming tools, Internet software, and Unix device drivers. He is a member of USENIX, the ACM, the IEEE Computer Society, the Internet Society, and the American Association for the Advancement of Science. He was a co-founder of the Linux Clusters Institute and recently served as Secretary of the Cray Users Group. Bob received his Ph.D. in Computer Science from U.C. Berkeley in 1989.

9:00 am–12:30 pm

Seacliff C Room

The R programming language and ecosystem constitute a rich tool set for performing system analyses, for communicating the results and importance of those analyses, and for automating the process with reproducible and repeatable results. This brief introduction to R and its ecosystem will provide a walk along the mainline — coming up to speed on R, accessing data, and getting results.

This tutorial will

motivate you to pick up R

introduce the basics of the R language

demonstrate useful techniques using R and RStudio

illustrate ways to simplify your life by automating data analysis and reporting

In-class demonstrations will be complemented with hands-on opportunities during the workshop. Additional exercises and data sets that students can explore following the workshop will be provided.

Who should attend:
This tutorial is designed for system administrators who are awash in operational data and who want to do a more efficient job of understanding their data and communicating their findings to others. Some facility with programming and a knowledge of basic descriptive statistics are assumed. Prior knowledge of R is not required.

Take back to work:

Understanding where R fits into the system administrator’s tool set

Acquaintance with R, R packages, and R Studio

Familiarity with basic R data-manipulation techniques

Motivation to learn or improve your R skills

Next steps to take in mastering R

Topics include:
Analytics of System Data

Additional Materials:

Back to Top

Christopher DeMarco, Rhiza, a Nielsen Company

Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.
@cdemarco

Read more about Christopher DeMarco, Rhiza, a Nielsen Company

M4

Deployment and Orchestration with Terraform

Christopher DeMarco, Rhiza, a Nielsen Company
Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

9:00 am–12:30 pm

Seacliff AB Room

Terraform is a tool for deploying and configuring cloud infrastructure in AWS, Google Compute Engine, Digital Ocean, Azure, and many, many other platforms. It is a consistent, robust, well-maintained alternative to clicking in a web interface or writing custom provisioning code against the cloud provider's API.

This tutorial will show code and runtime examples of deploying various types of cloud infrastructure in AWS, Google Compute Engine, and others. Interactivity is unfortunately not offered due to the logistics of billing for arbitrary cloud resources.

Who should attend:
Novice- to intermediate-level sysadmins who to learn what Terraform is and what it's good for, why you'd use it instead of your cloud provider's web interface or API, and how to implement common patterns across several different providers.

Take back to work:
What is Terraform? What is it good for? How do we use it to build/manage infrastructure? How do we scale it to a team?

Topics include:
Terraform

Additional Materials:

Back to Top

Tom Limoncelli, Stack Overflow, Inc.

Tom is the SRE Manager at StackOverflow.com and author of Time Management for System Administrators (O'Reilly). He is co-author of The Practice of System and Network Administration (3rd edition just released) and The Practice of Cloud System Administration. He is an internationally recognized author, speaker, system administrator, and DevOps advocate. He's previously worked at small and large companies including Google, Bell Labs/Lucent, and AT&T. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.
@YesThatTom

Read more about Tom Limoncelli, Stack Overflow, Inc.

M5

Personal Time Management: The Basics for Sysadmins That Are Overloaded

Tom Limoncelli, Stack Overflow, Inc.
Tom is the SRE Manager at StackOverflow.com and author of Time Management for System Administrators (O'Reilly). He is co-author of The Practice of System and Network Administration (3rd edition just released) and The Practice of Cloud System Administration. He is an internationally recognized author, speaker, system administrator, and DevOps advocate. He's previously worked at small and large companies including Google, Bell Labs/Lucent, and AT&T. His blog is http://EverythingSysadmin.com and he tweets @YesThatTom.

9:00 am–12:30 pm

Marina Room

Whether you are a sysadmin, dev, or web ops, time management can be more difficult than any technology issue. This class is for new and junior system admins that have found themselves over their head, overloaded, and looking for a better way to survive the tech world.

This tutorial presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact, plus "The Cycle System," which is the easiest and most effective way to juggle all your tasks without dropping any.

Who should attend:
Sysadmins, devs, operations, and their managers

Take back to work:
By the end of this class, you will be able to schedule and prioritize your work (rather than be interruption-driven), have perfect follow-through (never forget a request), and limit your work-time to 40 hours a week (have a life).

Topics include:

How to manage all the work you have to do.

How to prioritize and eliminate unnecessary tasks.

Manage interruptions: prevent them, managing the ones you get.

The Cycle System for recording and processing to-do lists

Task grouping: batching, sharding, and multitasking

Additional Materials:

Back to Top

Half Day Afternoon

Jeanne Schock, Armstrong Flooring Inc.

Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.
@JeanneSchock

Read more about Jeanne Schock, Armstrong Flooring Inc.

M6

Building a Process to Manage the Problems that Cause Incidents

Jeanne Schock, Armstrong Flooring Inc.
Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.

1:30 pm–5:00 pm

Marina Room

All too often, technical teams spend so much time firefighting that they can’t stop to identify and eliminate the problems—the underlying causes—of incidents. Incident resolution is about taking care of the customer—restoring a service to normal levels of operation ASAP. Without a process in place to turn the problem into a known error, the root causes of the incident remain, resulting in recurrences of the incident.

The goals of the Problem Management Process are to prevent repeat incidents and to minimize the impact of incidents and problems that cannot be prevented. Most technical people already have experience in root cause analysis and problem resolution. This tutorial will help them to be measurably more consistent, mature and effective in their practices. Using IT Infrastructure Library (ITIL) best practices, this tutorial will deliver step-by-step instructions on building and managing a problem process.

Who should attend:
Technical people and managers responsible for the support of live production services. This is an operational support process that can be put in place from the bottom up. The more teams involved in the process—DBAs, system administrators, developers, helpdesk—the greater the scope of problems that can be addressed.

Take back to work:

a step-by-step guide for building and implementing a problem process and the reasons behind each step

a process template with examples that can be easily adapted to fit your organization’s current and future needs

instructions on setting up a Known Error Database and communicating work arounds with impacted support teams

guidance for getting buy-in from peers and managers

a complete kit for starting to use After Action Reviews to handle the human component of problems

Topics include:

Incident response vs. problem resolution

Root cause analysis techniques

Making decisions that are aligned with business objectives

Getting buy-in from teammates, colleagues and managers

Proactive problem management

After-action reviews

Additional Materials:

Back to Top

Jérôme Petazzoni, Docker Inc.

Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.
@jpetazzo

Read more about Jérôme Petazzoni, Docker Inc.

M7

Getting Started with Docker and Containers

Jérôme Petazzoni, Docker Inc. Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

1:30 pm–5:00 pm

Garden Room

If you still haven't checked that Docker thing, but need (or want) to get started with containers, this tutorial is for you!

After a short introduction explaining various usage scenarios for containers, we will roll up the sleeves of our T-shirts, and run a few simple containers through the Docker CLI. We will explain the difference between containers and images, and write a Dockerfile to build an image for a trivial application. Finally, we will present Compose, a tool to build, run, and manage stacks with multiple containers.

No prior knowledge of Docker is needed. If you know how to interact with the UNIX command line, you're set! Some demos will feature code snippets in Python, Ruby, or even C; but you will be perfectly fine even if your language of choice is Bash.

Advanced topics like networks, volumes, plugins, multi-stage builds, health checks, etc. will be mentioned but not covered in depth.

The tutorial will be hands-on. You will be provided with a pre-configured Docker environment running on a cloud VM (you won't need to setup Docker or Vagrant or VirtualBox on your machine).

Who should attend:
Devs and ops who have managed to avoid the container hype so far but now want to catch up on all that Docker jazz

Take back to work:
The audience will learn about the basic principles of containers: what they are, what they're for, why they have been trending the last few years.

They will also learn how to use the Docker CLI to run simple containers; build container images with Dockerfiles; start multi-container applications with Docker Compose.

This will allow them to understand containers in general and Docker in particular; use them in simple scenarios; and have a reference point for more complex ones.

Topics include:
Docker, containers

Additional Materials:

Back to Top

Christopher DeMarco, Rhiza, a Nielsen Company

Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.
@cdemarco

Read more about Christopher DeMarco, Rhiza, a Nielsen Company

M8

Automation with Ansible

Christopher DeMarco, Rhiza, a Nielsen Company
Sysadmin-turned-IT-turned-DevOps, I've been coding automation since 1998.

1:30 pm–5:00 pm

Seacliff AB Room

Ansible is a fantastic starting point for automation—either when the learning curve or the infrastructure around Chef/Puppet is too high. New users can start writing useful automation playbooks with just an SSH connection and an hour (or two) reading the docs.

This tutorial will alternate between lecture and hands-on activities using (instructor-supplied) disposable cloud infrastructure.

Who should attend:
Sysadmins with zero exposure to Ansible through intermediate-level users who want a guided tour of its potential.

Take back to work:
Knowledge of what Ansible is, how it works, and how it compares with other configuration-management tools; hands-on experience using Ansible to solve real-world problems; and opinionated best-practices for saving blood, sweat, and/or tears.

Topics include:
Ansible

Additional Materials:

Back to Top

Courtney Eckhardt

Courtney Eckhardt first got into retrospectives when she signed up for comp.risks as an undergrad (and since then, not as much has changed as we’d like to think). Her perspectives on engineering process improvement are strongly informed by the work of Kathy Sierra and Don Norman (among others).

Read more about Courtney Eckhardt

M9

"I Never Want to Live through This Again!": Running Excellent Retrospectives

Courtney Eckhardt
Courtney Eckhardt first got into retrospectives when she signed up for comp.risks as an undergrad (and since then, not as much has changed as we’d like to think). Her perspectives on engineering process improvement are strongly informed by the work of Kathy Sierra and Don Norman (among others).

1:30 pm–5:00 pm

Seacliff C Room

Your site’s back up, you’re back in business. Do you have a way to make sure that problem doesn’t happen again? And if you do, do you like how it works?

Heroku uses a blameless retrospective process to understand and learn from our operational incidents. We’ve recently released the templates and documentation we use in this process, but experience has taught us that facilitating a retrospective is a skill that’s best taught person to person.

This tutorial will take you through a retrospective based on the internal and external communications of a real Heroku operational incident. We’ve designed it to help you experience first-hand the relaxed, collaborative space that we achieve in our best retrospectives. We’ll practice tactics like active listening, redirecting blame, and reframing conversations. Along the way, we’ll discuss how we developed this process, what issues we were trying to solve, and how we’re still iterating on it.

Who should attend:
Managers, tech leads, anyone interested in retrospective culture and iterating on processes.

Take back to work:
Attendees will have the materials and firsthand experience to advocate for (or to begin) an incident retrospective process at their workplace, or to improve a process they might already be using.

Topics include:

Why run a retrospective

Goal of a retrospective

Blameless retrospectives

Facilitating: redirecting blame, reframing, drawing people out

How to structure a retrospective

Preparing for a retrospective

Five "why"s / infinite "how"s

Human error

Achieving follow-through on remediation items

Additional Materials:

Back to Top

Tuesday, October 31, 2017

Full Day

Theodore Ts'o, Google

Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.
@tytso

Read more about Theodore Ts'o, Google

T5

Linux Performance Tuning

Theodore Ts'o, Google
Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian developer, and is the maintainer of the ext4 file system in the Linux kernel. He is the maintainer and original author of the e2fsprogs userspace utilities for the ext2, ext3, and ext4 file systems.

9:00 am–5:00 pm

Bayview Room

Who should attend:
Intermediate and advanced Linux system administrators who want to understand their systems better and get the most out of them.

Take back to work:
The ability to hone your Linux systems for the specific tasks they need to perform.

Topics include:

Strategies for performance tuning

Characterizing your workload's requirements

Finding bottlenecks

Tools for measuring system performance

Memory usage tuning

Filesystem and storage tuning

Network tuning

Latency vs. throughput

Capacity planning

Profiling

Memory cache and TLB tuning

Application tuning strategies

Additional Materials:

Back to Top

Half Day Morning

Chris McEniry, Sony Interactive Entertainment

Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.
@macmceniry

Read more about Chris McEniry, Sony Interactive Entertainment

T1

Close to the Edge Systems Administration in Go

Chris McEniry, Sony Interactive Entertainment
Chris "Mac" McEniry is a practicing sysadmin responsible for running a large ecommerce and gaming service. He's been working and developing in an operational capacity for 15 years. In his free time, he builds tools and thinks about efficiency.

9:00 am–12:30 pm

Seacliff C Room

There's many times that the daily grind pushes you out of your comfort zone. Sometimes, you're in a bind and the best way forward is fashioning a tool out of what's available. Sometimes, those really are nails you see around you. This class looks at some of the normal, and some of the not so normal, uses for Golang in Systems Administration.

Who should attend:

New Golang programmers who want to get a better idea of using the language (should have some familiarity with Golang).

Old dogs looking for new tricks.

Take back to work:

Several MacGyver tools that may come in handy.

Techniques and approaches for some out of the box thinking.

Topics include:

Running a quick and dirty TLS secured web server for file transfers

Collecting and serving up system metrics

Driving web applications from the command line

Speak http2

Fanout shell results from one system to many with ssh

Roll your own container system

and more

Additional Materials:

Back to Top

Sasha Goldshtein, CTO, Sela Group

Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP, Pluralsight author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing—across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.
@goldshtn

Read more about Sasha Goldshtein, CTO, Sela Group

T2

Linux Performance Monitoring with BPF

Sasha Goldshtein, CTO, Sela Group
Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP, Pluralsight author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing—across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.

9:00 am–12:30 pm

Garden Room

eBPF (extended Berkeley Packet Filters) is a modern kernel technology that can be used to introduce dynamic tracing into a system that wasn't prepared or instrumented in any way. The tracing programs run in the kernel, are guaranteed to never crash or hang your system, and can probe every module and function—from the kernel to user-space frameworks such as Node and Ruby.

In this workshop, you will experiment with Linux dynamic tracing first-hand. First, you will explore BCC, the BPF Compiler Collection, which is a set of tools and libraries for dynamic tracing. Many of your tracing needs will be answered by BCC, and you will experiment with memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the "baked" tools for file I/O, network, and CPU analysis. You'll be able to choose between working on a set of hands-on labs prepared by the instructors, or trying the tools out on your own test system.

Next, you will hack on some of the bleeding edge tools in the BCC toolkit, and build a couple of simple tools of your own. You'll be able to pick from a curated list of GitHub issues for the BCC project, a set of hands-on labs with known "school solutions", and an open-ended list of problems that need tools for effective analysis. At the end of this workshop, you will be equipped with a toolbox for diagnosing issues in the field, as well as a framework for building your own tools when the generic ones do not suffice.

Who should attend:
Developers, SRE, ops engineers

Take back to work:
Low-overhead, production-ready tools based on the BPF kernel technology for CPU sampling, memory leak analysis, I/O and file issues, and many other performance and troubleshooting scenarios.

Topics include:
Performance, Monitoring, Tracing, BPF, Kernel

Additional Materials:

Back to Top

Mike Weilgart, Vertical Sysadmin, Inc.

Mike Weilgart has loved maths and computers all his life. Graduating high school at the age of 13, he thereafter worked in a variety of positions including software QA, calculus teacher, and graphic design, before resolving to put his love of computers to professional use as a Linux sysadmin and trainer. Mike currently consults at a Fortune 50 company as an automation specialist, and enjoys nothing more than training people to full mastery of their tools.

Read more about Mike Weilgart, Vertical Sysadmin, Inc.

T3

Git Foundations: Unlocking the Mysteries

Mike Weilgart, Vertical Sysadmin, Inc.
Mike Weilgart has loved maths and computers all his life. Graduating high school at the age of 13, he thereafter worked in a variety of positions including software QA, calculus teacher, and graphic design, before resolving to put his love of computers to professional use as a Linux sysadmin and trainer. Mike currently consults at a Fortune 50 company as an automation specialist, and enjoys nothing more than training people to full mastery of their tools.

9:00 am–12:30 pm

Seacliff AB Room

Dozens of commands! Hundreds of options! Git has dumbfounded sysadmins and developers alike since its appearance in 2005.

And yet, this ingenious software is among the most fantastically useful ever developed.

Learn Git from the ground up and the inside out with Git Foundations Training!

This half-day class explores Git's internals in depth and includes unique practical exercises to gain familiarity and comfort in handling the nuts and bolts.

Bring with you:

A laptop with a UNIX-like command-line environment on which "git --version" displays a version (any version).

A willingness to learn.

Who should attend:
No prior knowledge of Git is required. Basic Unix/Linux command line experience is assumed. Experienced users of Git have given rave reviews; the class is not aimed only at beginners, but at anyone wishing to thoroughly understand and use Git to the fullest.

Take back to work:

A thorough and practical understanding of the internals of Git

The ability to easily and *confidently* manipulate Git repositories and their contents

Readiness to pick up and *quickly* learn more exotic and advanced Git commands (and to read the man pages easily!)

Topics include:
Git Internals are covered in depth, beginning from basic definitions and proceeding through the essentials of Graph Theory needed to appreciate Git's architecture. Plenty of audience Q&A throughout, live demonstrations, and diagrams. Following this complete theory portion comes the practical portion of the course, with hands-on exercises to ensure retention and application of all theory.

Additional Materials:

Back to Top

Jeanne Schock, Armstrong Flooring Inc.

Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.
@JeanneSchock

Read more about Jeanne Schock, Armstrong Flooring Inc.

T4

Speed Up Your Change Control: Streamline Your Change Process without Increasing Risks

Jeanne Schock, Armstrong Flooring Inc.
Jeanne Schock has a background in Linux/FreeBSD/Windows system administration that includes working at a regional ISP, a large video hosting company and a Top Level Domain Registry services and DNS provider. About 7 years ago she transitioned to a role building and managing processes in support of IT operations, disaster recovery, and continual improvement. She is a certified Expert in the IT Infrastructure Library (ITIL) process framework with in-the-trenches experience in Change, Incident, and Problem Management. Jeanne also has a pre-IT academic and teaching career and is an experienced trainer and public presenter.

9:00 am–12:30 pm

Marina Room

Speedy Change Control is not an oxymoron. This tutorial will provide practical, actionable steps to streamline and speed up change control at your organization without increasing risks. In The Visible Ops Handbook, authors Behr, Kim, and Spafford identify a culture of change management as common to high-performing IT groups: “change management does not slow things down in these organizations.” This tutorial will help anyone wishing to implement phase one of the handbook: “Stabilize The Patient” And “Modify First Response”. While I draw heavily on IT infrastructure Library (ITIL) guidance, much of this is common sense good practice based on lessons learned from past success and failure. No special ticketing system, tools or ITIL knowledge are necessary. I am a certified ITIL Expert. I have over five years of experience designing, improving and managing a successful change management process at an audited technology company delivering public registry and DNS services running on complex technologies across international data centers.

Who should attend:
Individuals and managers involved in preparing for and deploying changes and software builds in production environments.

Take back to work:

templates for change request types and procedures

templates for creating standard operating procedures

ITIL-aligned talking points for making your case for these process improvements

Topics include:

Change management

Process

Different change types to help you speed up the process

Assessing risks and potential impact

Defining change authorities specific for each change type

Metrics for measuring change process performance against goals

Release and deployment management

Continuous delivery

Additional Materials:

Back to Top

Half Day Afternoon

Camille Baldock, Salesforce

Camille Baldock is an infrastructure engineer with the Heroku Department of Data. She works on distributed systems monitoring, operations, automation, and tuning for Heroku Postgres.
@camille_

Read more about Camille Baldock, Salesforce

T6

Troubleshooting Performance Issues in PostgreSQL

Camille Baldock, Salesforce
Camille Baldock is an infrastructure engineer with the Heroku Department of Data. She works on distributed systems monitoring, operations, automation, and tuning for Heroku Postgres.

1:30 pm–5:00 pm

Marina Room

This tutorial will give you ways of diagnosing and preempting PostgreSQL performance issues using a wide range of tools and techniques to measure and improve your database's performance. We will cover query optimisation, configuration, and OS settings for your database server and pooling, caching, replication, and partitioning strategies that can be used to ensure performance at scale.

Who should attend:
The target audience for this talk is server administrators and developers working with PostgreSQL, or considering using it. No specific knowledge of PostgreSQL is required but some background in RDBMS or SQL is recommended.

Take back to work:
System administrators will benefit by learning about:

what aspects of server and PostgreSQL configuration affect database performance and how to choose and tweak them

how to monitor the database server to maintain high performance

Developers will benefit by learning about:

detecting performance issues in their database usage

optimising their queries

Topics include:

This tutorial breaks down into the various potential causes of performance issues in PostgreSQL: how to diagnose them, fix them and monitor them

Query performance issues

Choosing the right PostgreSQL configuration within hardware and OS limitations

Operating system and hardware tweaks that can affect performance

Optimising database usage

Monitoring your database and database servers performance

Additional Materials:

Back to Top

John Looney, Intercom

John Looney is an SRE in Intercom, pretending to be a Product Engineer, improving infrastructure and reliability while pretending to also add features customers want.

Previously, he spent a decade in Google SRE running GFS, Borg, Colossus, Chubby, Datacenter Automation, Ads Quality pipelines and Ads Serving systems.

He has been on the programme committee of SRECon Dublin for the last three years, and presented a 'Large Scale Design' tutorial at LISA in 2012.

Read more about John Looney, Intercom

T7

Distributed Systems Building Blocks

John Looney, Intercom
John Looney is an SRE in Intercom, pretending to be a Product Engineer, improving infrastructure and reliability while pretending to also add features customers want.

Previously, he spent a decade in Google SRE running GFS, Borg, Colossus, Chubby, Datacenter Automation, Ads Quality pipelines and Ads Serving systems.

He has been on the programme committee of SRECon Dublin for the last three years, and presented a 'Large Scale Design' tutorial at LISA in 2012.

1:30 pm–5:00 pm

Seacliff AB Room

All distributed systems make tradeoffs and compromises. Different designs behave very differently with respect to cost, performance, and how they behave under failure conditions.

It's important to understand the tradeoffs that the building blocks in your systems make, and the implications this has for your system as a whole. In this workshop we'll look at several examples of different real-world distributed systems and discuss their strengths and shortcomings.

This workshop will include some practical elements. Attendees will be given some system designs to read and to evaluate, and then we'll discuss the implications of each design together as a group.

Who should attend:
People working with distributed systems, who want to fill-in the blanks as to what 'distributed systems' are supposed to be.

Take back to work:
They will know the basic building blocks of distributed systems, how to choose between different implementations as needed.

They will know the names and basic details on common distributed systems patterns, why they exist and what happens when they are not applied correctly.

Topics include:
Distributed Systems Primer

Additional Materials:

Back to Top

Aleksey Tsalolikhin, Vertical Sysadmin, Inc.

Aleksey Tsalolikhin is a practitioner in the area of Operations of information systems. Aleksey's mission is to improve the lives of fellow practitioners through effective training in excellent technologies. Aleksey is the principal at Vertical Sysadmin, which provides on-site training on UNIX shell basics, version control with Git, Configuration Management, Continuous Integration/Continuous Deployment, SQL basics and more.
@atsaloli

Read more about Aleksey Tsalolikhin, Vertical Sysadmin, Inc.

T8

Setting up CI/CD Pipelines

Aleksey Tsalolikhin, Vertical Sysadmin, Inc.
Aleksey Tsalolikhin is a practitioner in the area of Operations of information systems. Aleksey's mission is to improve the lives of fellow practitioners through effective training in excellent technologies. Aleksey is the principal at Vertical Sysadmin, which provides on-site training on UNIX shell basics, version control with Git, Configuration Management, Continuous Integration/Continuous Deployment, SQL basics and more.

1:30 pm–5:00 pm

Garden Room

Attendees will learn how CI/CD pipelines can increase IT velocity (from Dev to Ops), increase code quality and lower risk; and will learn how to implement CI/CD pipelines in two popular tools, Jenkins and GitLab CI.

Who should attend:
Infrastructure engineers, system administrators, or DevOps engineers familiar with Git who have to set up or support CI/CD pipelines.

Take back to work:
Familiarity with CI/CD concepts; ability to implement CI/CD pipelines using popular tools such as Jenkins and GitLab CI.

Topics include:

Introduction and orientation

Origin of Continuous Integration (CI) at ThoughtWorks

Widespread adoption; how CI relates to DevOps

Basic tasks: Build, Test, Deploy

Jenkins

Overview and Architecture

Definition of Key Terms

Building, Testing and Deploying (with hands-on lab)

Checking Pipeline status with Jenkins Blue Ocean UI

Troubleshooting

GitLab CI

Architecture: GitLab, GitLab CI Multi Runner, ephemeral test environments

Definitions: pipeline, stage, job, build, runner, environment, artifact, cache

Setting up runners: adding job runners; host instance types (shell, Docker, ssh, etc.); runner/job tags

Building, Testing, and Deploying (with hands-on lab)

Troubleshooting: build logs; enabling verbose builds; increasing "loglevel"; interactive access to containers

Additional Materials:

Back to Top

Jérôme Petazzoni, Docker Inc.

Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.
@jpetazzo

Read more about Jérôme Petazzoni, Docker Inc.

T9

Build, Ship, and Run Microservices on a Docker Swarm Cluster

Jérôme Petazzoni, Docker Inc. Jerome is a senior engineer at Docker, where he helps others to containerize all the things. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, operated and scaled the dotCloud PAAS, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

1:30 pm–5:00 pm

Seacliff C Room

In this tutorial, you will setup your own Docker cluster, using the native orchestration features provided by the SwarmKit library. (SwarmKit is integrated with the Docker Engine since Docker 1.12.)

Then, you will use that cluster to deploy and scale a sample application architectured around microservices.

We will cover deployment tips, service discovery, load balancing; we will show how to integrate Swarm and Compose to obtain a seamless, automated "dev-to-prod" workflow; and we will show how to collect logs and metrics on a containerized platform.

To get the most out of this tutorial, you should already be familiar with Docker! If you plan to attend this just after the other tutorial "Getting started with Docker and containers," you will definitely have to mind the gap.

The tutorial will be hands-on; each attendee will be provided with a cluster of Docker nodes running on cloud VMs. The only software required on your machine is a SSH client (and a web browser).

Who should attend:
Folks who were excited by (or forced to deploy) Docker Swarm, but want to go beyond the trivial prototype, and implement a seamless dev-to-prod workflow, and tackle logging, metrics, security, etc.

Take back to work:
After this tutorial, the audience will know how to map their existing "ops knowledge" of traditional platforms, to container platforms.

Topics include:
Docker, cluster, Swarm, orchestration, containers

Additional Materials:

Back to Top

Monday, February 12, 2018

Half Day Morning

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

Read more about Brent Welch, Google

M1

Understanding Large-Scale Storage Systems

Brent Welch, Google
Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

9:00 am–12:30 pm

Grand Ballroom ABCD

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.
Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.
The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include:

SSD technology

Scaling the data path

Scaling metadata

Fault tolerance

Manageability

Cloud storage

Specific systems are discussed, including Ceph, Lustre, GPFS, PanFS, HDFS (Hadoop File System), BigTable, LevelDB, and Google's Colossus File System

Additional Materials:

Back to Top

Ymir Vigfusson, Emory University

Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

Read more about Ymir Vigfusson, Emory University

Irfan Ahmad, CachePhysics

Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

Read more about Irfan Ahmad, CachePhysics

M2

Caches for the Persistent Memory and Flash Era

Ymir Vigfusson, Emory University
Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

Irfan Ahmad, CachePhysics
Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

9:00 am–12:30 pm

Grand Ballroom EFGH

For a very long time, practical scaling of every level in the computing hierarchy has required innovation and improvement in caches. This is as true for CPUs as it is for storage and networked, distributed systems. As such, research into cache efficiency and efficacy improvements has been highly motivated and continues with strong improvements to this day. However, there are certain areas in cache algorithms optimization that have only recently experienced breakthroughs.

In this tutorial, we will start by reviewing the history of the caching algorithm research and practice in industry. Of particular interest to us are multi-tier memory hierarchies that are getting more complex and deep due to hardware innovations. These hierarchies and the workloads they generate motivate revisiting multi-tier algorithms. We will then describe cache utility curves and review recent literature that has made them easier to compute. Using this tool, we will excavate around caching policies and their trade-offs in different contexts. We will also spend some time thinking about optimality for caches.

Topics include:

Overview and history of the caching algorithm research and practice in industry

Introduction to new challenges posed by multi-tier memory hierarchies

Review of cache utility curves and recent literature

Experimenting with caching policies for production uses cases

How to find the optimal cache

Additional Materials:

Back to Top

Half Day Afternoon

Tom Talpey, Microsoft

Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

Read more about Tom Talpey, Microsoft

Andy Rudoff, Intel

Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

Read more about Andy Rudoff, Intel

M3

Advanced Persistent Memory Programming: Local, Remote and Cross-Platform

Tom Talpey, Microsoft
Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

Andy Rudoff, Intel
Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

1:30 pm–5:00 pm

Grand Ballroom ABCD

Persistent Memory (“PM”) support is becoming ubiquitous in today’s operating systems and computing platforms. From Windows to Linux to open source, and from NVDIMM, PCI Express, storage-attached and network-attached interconnect access, it is available broadly across the industry. Its byte-addressability and ultra-low latency, combined with its durability, promise a revolution in storage and applications as they evolve to take advantage of these new platform capabilities.
Our tutorial explores the concepts and today’s programming methodologies for PM, including the SNIA NonVolatile Memory Programming Model architecture, open source and native APIs, operating system support for PM such as direct access filesystems, and via language and compiler approaches as well. The software PM landscape is already rich, and growing.
Additionally, the tutorial will explore the considerations when PM access is extended across fabrics such as networks, I/O interconnects, and other non-local access. While the programming paradigms remain common, the implications on latency, protocols, and especially error recovery are critically important to both performance and correctness. Understanding these requirements are of interest to both the system and application developer or designer.
Specific programming examples, fully functional on today’s systems, will be shown and analyzed. Concepts for moving new applications and storage paradigms to PM will be motivated and explored. Application developers, system software developers, and network system designers will all benefit. Anyone interested in an in-depth introduction to PM in emerging software and hardware systems can also expect an illuminating and thought-provoking experience.

Topics include:

Persistent memory

Persistent memory technologies

Remote persistent memory

Programming interfaces

Operating systems

Open source libraries

RDMA

Additional Materials:

Back to Top

Sandeep Uttamchandani, Intuit

Sandeep Uttamchandani is a Distinguished Engineer at Intuit, focussing on platforms for storage, databases, analytics, and machine learning. Prior to Intuit, Sandeep was co-founder and CEO of a machine learning startup focussed on finding security vulnerabilities in Cloud Native deployment stacks. Sandeep has nearly two decades of experience in storage and data platforms, and has held various technical leadership roles at VMware and IBM. Over his career, Sandeep has contributed to multiple enterprise products, and holds 35+ issued patents, 20+ conference and journal publications, and regularly blogs on All-things-Enterprise-Data. He has a Ph.D. from University of Illinois at Urbana-Champaign.

Read more about Sandeep Uttamchandani, Intuit

M4

Weaving Your Enterprise Data Fabric to Harness Machine Learning In-Production

Sandeep Uttamchandani, Intuit
Sandeep Uttamchandani is a Distinguished Engineer at Intuit, focussing on platforms for storage, databases, analytics, and machine learning. Prior to Intuit, Sandeep was co-founder and CEO of a machine learning startup focussed on finding security vulnerabilities in Cloud Native deployment stacks. Sandeep has nearly two decades of experience in storage and data platforms, and has held various technical leadership roles at VMware and IBM. Over his career, Sandeep has contributed to multiple enterprise products, and holds 35+ issued patents, 20+ conference and journal publications, and regularly blogs on All-things-Enterprise-Data. He has a Ph.D. from University of Illinois at Urbana-Champaign.

1:30 pm–5:00 pm

Grand Ballroom EFGH

Enterprises today have a plethora of information that needs to be harnessed for business insights. Over the years, Enterprises have made investments in a variety of storage solutions, relational databases, warehouses, NoSQL stores, Big Data analytics platforms, Data Lakes, Cloud Stores, etc. As we enter the era of Machine Learning (ML), it is important to understand how to bring these silos together to discover, build, and deploy ML models in production.

This tutorial covers the technical concepts and architectural models required to operationalize and architect your Enterprise Data Fabric for ML initiatives. The tutorial is divided into the following sections:

A Data Engineering perspective on the end-to-end ML workflow in-production

Taxonomy of requirements & landscape of available building blocks for the Data Fabric

Putting it together: Defining the Data Fabric architecture with reference examples

The tutorial assumes a basic knowledge of popular Big Data and Analytics solutions. We assume no ML background—our focus will be on operational concepts rather than the internal mathematical formulations of ML algorithms. The tutorial is designed for Storage architects, Data Engineers, and Engineering Managers interested in learning designing of Data Fabrics.

Topics include:

Different architectures for Data Stores (Relational, MPP, NoSQL, Event Stores, In-memory grids, etc.)

Different Analytic programming models and Frameworks (Batch, Interactive, Stream)

Example Cloud computing platforms for Data Management

Workflow for Machine Learning models in production

Blue-print of a Data Fabric

Examples reference architectures of Data Fabric deployments

Additional Materials:

Back to Top

Monday, February 25, 2019

Half Day Morning

Brent Welch, Google

Brent Welch is a senior staff software engineer at Google, where he works on their public cloud system. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through operating systems, network services, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. While at Panasas he helped build the PanFS cluster file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

Read more about Brent Welch, Google

Morning Tutorial 1:

Understanding Large Scale Storage Systems

Brent Welch, Google
Brent Welch is a senior staff software engineer at Google, where he works on their public cloud system. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through operating systems, network services, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. While at Panasas he helped build the PanFS cluster file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

9:00 am–12:30 pm

Constitution Ballroom A

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices including traditional hard drives, SSD, and new non-volatile memory devices. Next, we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include:

SSD technology

NVRAM

Scaling the data path

Scaling metadata

Fault tolerance

Manageability

Cloud storage

Additional Materials:

Back to Top

Half Day Afternoon

Andy Rudoff, Intel

Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

Read more about Andy Rudoff, Intel

Tom Talpey, Microsoft

Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

Read more about Tom Talpey, Microsoft

Afternoon Tutorial 1:

Advanced Persistent Memory Programming

Andy Rudoff, Intel
Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

Tom Talpey, Microsoft
Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

2:00 pm–5:30 pm

Constitution Ballroom A

Persistent Memory (“PM”) support is becoming ubiquitous in today’s operating systems and computing platforms. From Windows to Linux to open source, and from NVDIMM, PCI Express, storage-attached and network-attached interconnect access, it is available broadly across the industry. Its byte-addressability and ultra-low latency, combined with its durability, promise a revolution in storage and applications as they evolve to take advantage of these new platform capabilities.

The tutorial explores the concepts and today’s programming methodologies for PM, including the SNIA NonVolatile Memory Programming Model architecture, open source and native APIs, operating system support for PM such as direct access filesystems, and via language and compiler approaches. The software PM landscape is already rich and growing.

Additionally, the tutorial will explore the considerations when PM access is extended across fabrics such as networks, I/O interconnects, and other non-local access. While the programming paradigms remain common, the implications on latency, protocols, and especially error recovery are critically important to both performance and correctness. Understanding these requirements are of interest to both the system and application developer or designer.

Specific programming examples, fully functional on today’s systems, will be shown and analyzed. Concepts for moving new applications and storage paradigms to PM will be motivated and explored. Application developers, system software developers, and network system designers will all benefit. Anyone interested in an in-depth introduction to PM in emerging software and hardware systems can also expect an illuminating and thought-provoking experience.

Topics include:

Persistent Memory

Persistent Memory Technologies

Remote Persistent Memory

Programming Interfaces

Operating Systems

Open Source Libraries

RDMA

Additional Materials:

Back to Top

Irfan Ahmad, CachePhysics

Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

Read more about Irfan Ahmad, CachePhysics

Ymir Vigfusson, Emory University

Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

Read more about Ymir Vigfusson, Emory University

Afternoon Tutorial 2:

Caches in the Modern Memory Hierarchy with Persistent Memory and Flash

Irfan Ahmad, CachePhysics
Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

Ymir Vigfusson, Emory University
Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

2:00 pm–5:30 pm

Constitution Ballroom B

For a very long time, practical scaling of every level in the computing hierarchy has required innovation and improvement in caches. This is as true for CPUs as it is for storage and networked, distributed systems. As such, research into cache efficiency and efficacy improvements has been highly motivated and continues with strong improvements to this day. However, there are certain areas in cache algorithms optimization that have only recently experienced breakthroughs.

In this tutorial, we will start by reviewing the history of the caching algorithm research and practice in industry. Of particular interest to us are multi-tier memory hierarchies that are getting more complex and deep due to hardware innovations. These hierarchies motivate revisiting multi-tier algorithms. We will then review key tools in the research or and management called cache utility curves and recent literature that has made them easier to compute. Using this tool, we will excavate around caching policies and their trade-offs. We will also spend some time thinking about optimality for caches in modern memory hierarchies with DRAM, non-volatile/persistent memory and flash.

Topics include:

Overview and history of the caching algorithm research and practice in industry

Introduction to new challenges posed by multi-tier memory hierarchies

Review of Cache utility curves and recent literature

Experimenting with caching policies for production uses cases

How to find the optimal cache

Additional Materials:

Back to Top

You are here

Full Training Program

Full Day

Half Day Morning

Half Day Afternoon

Full Day

Half Day Morning

Half Day Afternoon

Full Day

Half Day Morning

Half Day Afternoon

Full Day

Full Day

Full Day

Half Day Morning

Half Day Afternoon

Full Day

Full Day

Half Day Morning

Half Day Afternoon

Full Day

Half Day Morning

Half Day Afternoon

Full Day

Half Day Morning

Half Day Afternoon

Full Day

Full Day

Full Day

Half Day Morning

Half Day Afternoon

Half Day Morning

Half Day Afternoon

Half Day Morning

Half Day Afternoon

Full Day

Full Day