Sunday, June 14, 2009
Sunday Full-Day Tutorials
S2 ZFS: A File System for Modern Hardware NEW!
Richard Elling, Enterprise Systems Consultant
Who should attend: Systems engineers, integrators, and administrators who are interested in deploying ZFS on Solaris, Mac OS X, or FreeBSD.
Participants should be familiar with storage devices, RAID systems,
logical volume managers, backup, and file system features. Special
emphasis will be placed on integration considerations for virtualization,
NAS, and databases.
File systems developed in the mid 20th century were severely
constrained by the storage hardware available at the time. ZFS was
conceived with an eye toward the hardware of the future and how
storage will evolve. This presented an opportunity to rethink how
file systems use storage hardware. The result is a new way of managing
data which can evolve as the hardware changes while remaining
compatible with earlier notions of file system use. Along the way,
new concepts such as the Hybrid Storage Pool provide new opportunities for optimization, efficiency,
and data protection. In this tutorial,
ZFS will be examined from the bottom up, to build a solid understanding
of the data-hardware interface, and then from the top down, to provide
insight into the best ways to use ZFS for applications.
Take back to work: A solid understanding
of the concepts behind ZFS and how to make the best decisions when
implementing storage at your site.
- Evolution of hardware and file systems
- Storage pools
- RAID data protection
- Import/export and shared storage
- Pool parameters and features
- On-disk format
- Data sets
- POSIX-compliant file systems
- Practical considerations and best practices
- Deployment and migration
- Performance, observability, and tuning
- Data protection
- Hybrid storage pools
- Backup, restore, and archiving
S3 Practical Problem Solving with Hadoop & Pig
Milind Bhandarkar, Yahoo! Grid Solutions Team
Who should attend: Scientists, engineers, and developers interested in developing large-scale,
data-intensive applications. No previous experience with Hadoop or Pig is
required. Demonstrations will be presented, where appropriate.
Take back to work: The ability to:
- Design and develop Hadoop and Pig applications and higher-level
application frameworks to crunch several terabytes of data, using anywhere
from four to 4,000 computers
- Contribute to
Hadoop and Pig projects, as well as the rest of the Hadoop ecosystem
- Consult with engineering teams on the proper way to write and deploy
programs on either dedicated or shared Hadoop clusters
- Maximize performance of Hadoop and Pig applications
- Introduction to the Hadoop Distributed File System and Map-Reduce programming framework
- Several real applications and their implementation
- The Pig higher-level language for programming
- Performance tuning for both Hadoop and Pig applications
S5 The Python Programming Language
David Beazley, Dabeaz LLC
Who should attend: Programmers who want to know what the Python programming language is all
about and how it can be applied to a variety of practical problems in
data analysis, system admininistration, systems programming, and
networking. Although no prior Python knowledge is required, attendees
should be experienced in at least one other
programming language, such as C, C++, Java, or Perl. If you already
know some Python, this tutorial will improve your skills.
Python is a dynamic programming language that is often described as a
scripting language, as are such languages such as Perl, Tcl, and Ruby.
Although Python is often used for scripting, it is actually a
full-featured general-purpose programming language which supports a
wide variety of imperative, functional, and object-oriented
programming idioms. It also includes a large standard library that
provides support for operating system interfaces, networking, threads,
regular expressions, XML, GUIs, and more.
This tutorial will take a comprehensive tour of the Python
programming language and see how it can be used to solve a variety of
practical problems. The tutorial will illustrate important concepts
through examples that primarily focus on data analysis, systems
programming, and system administration.
Take back to work: A better understanding of what makes Python tick and an increased
awareness of how it can be applied to real-world problems.
The Python language
- Basic syntax
- Core datatypes
- Control flow and exception handling
- Classes and the Python Object Model
- C extensions
- Major library modules
- Text processing
- Operating system interfaces
- Network programming
- Internet programming
- Practical programming examples
- Text parsing
- Data analysis and manipulation
- Processing log files
- Handling real-time data streams
- Controlling and interacting with subprocesses
- Interacting with Web services
- Simple network programming
- Internet data handling
S6 Implementing Security the Right Way
Abe Singer, California Institute of Technology
Who should attend: System administrators, programmers, and geeks in general. The class will be
laced with anecdotes, examples, and horror stories based on the author's
"Best practices" for security are easy to find, but are they really best?
Are they right for your environment? Are they more than just slapdash
patches to security flaws?
Have you ever heard someone (maybe you) saying, "That would
never happen," "No attacker would ever figure that out," "It's secure, it
uses encryption," or "We're safe, we have a firewall"? One of the
problems with "best practices" is that people follow them blindly, with no real understandingof why those practices were instituted, how they work, or what's effective in their own
Maintaining the security of a system cannot be done by following a fixed set
of rules or a checklist. Nor can it be done effectively by setting out draconian
policies and expecting all users to behave well. The goal of this class is
to get you to think about security and to be able to evaluate security tools
and techniques with regard to your environment and goals.
Programmers develop software, and system administrators deploy and support
that software. Often the system administrator has to struggle with software
that doesn't fit well into his site's security architecture, because it makes
assumptions about how systems are installed and maintained. A simple
example: Some software assumes that the host it's running on is "behind
a firewall" (whatever that really means), but not all sites can run in that
manner. Moreover, programmers are often unaware of the different ways in which
sites are designed, of interactions with other systems, or of the simple issues
of scale that face a sysadmin who has to manage many services across multiple
System administrators, take back to work: A better handle
on how to look at how security affects your systems;
what questions to ask when evaluating the security of software
or a system; and how to think about the security impacts of your work.
Programmers, take back to work: An understanding of the security issues that affect
software and the common mistakes programmers make when implementing (or not
implementing) security measures. This is more than the old "avoid buffer
overflows" mantra! We'll talk in depth about how to think about incorporating
security technology and methods into application design.
- Common concepts and misconceptions about security
- A look at various security technologies and what they do
- At a high level:
- How to think about security and apply that thinking to everyday
administrative activities and programming
- How to approach making security
assessments and risk analyses
- How to go about responding to an intrusion
- At a lower level:
- Commonly used network services
- Security aspects of their protocols and configurations
Monday, June 15, 2009
Monday Full-Day Tutorials
M1 System and Network Performance Tuning
Marc Staveley, Soma Networks
Who should attend: Novice and advanced UNIX system and network administrators, and UNIX developers concerned about network performance impacts. A basic understanding of UNIX system facilities and network environments is assumed.
We will explore procedures and techniques for tuning systems, networks, and application code. Starting from the single system view, we will examine how the virtual memory system, the I/O system, and the file system can be measured and optimized. We'll extend the single host view to include Network File System tuning and performance strategies. Detailed treatment of networking performance problems, including network design and media choices, will lead to examples of network capacity planning. Application issues, such as system call optimization, memory usage and monitoring, code profiling, real-time programming, and techniques for controlling response time will be addressed. Many examples will be given, along with guidelines for capacity planning and customized monitoring based on your workloads and traffic patterns. Question and analysis periods for particular situations will be provided.
Take back to work: Procedures and techniques for tuning your systems, networks, and application code, along with guidelines for capacity planning and customized monitoring.
- Performance tuning strategies
- Practical goals
- Monitoring intervals
- Useful statistics
- Tools, tools, tools
- Server tuning
- Filesystem and disk tuning
- Memory consumption and swap space
- System resource monitoring
- NFS performance tuning
- NFS server constraints
- NFS client improvements
- NFS over WANs
- Automounter and other tricks
- Network performance, design, and capacity planning
- Locating bottlenecks
- Demand management
- Media choices and protocols
- Network topologies: bridges, switches, and routers
- Throughput and latency considerations
- Modeling resource usage
- Application tuning
- System resource usage
- Memory allocation
- Code profiling
- Job scheduling and queuing
- Real-time issues
- Managing response time
M2 Introduction to the Open Source Xen Hypervisor
Zach Shepherd and
Wenjin Hu, Clarkson University
Who should attend: System administrators and architects who are interested in running server services in virtual machines and deploying the open source Xen hypervisor in a production environment. No prior experience with Xen is required; however, a basic knowledge of Linux is helpful.
The Xen hypervisor is an innovative virtualization infrastructure
to provide fast and secure execution to multiple virtual machines and has been used to virtualize a wide range of guest operating systems, including Windows, Linux, Solaris, and various versions of the BSD operating systems. It is widely regarded as a compelling alternative to proprietary virtualization platforms and hypervisors for x86-compatible platforms and it is commonly deployed in industrial
and commercial environments as a promising approach to creating dynamic
datacenters and virtual servers.
Take back to work: How to build and deploy the Xen hypervisor.
- Overview of virtualization
- Overview of Xen architecture
- Virtual machine creation and operation
- Installation and configuration
- Networking and storage options
- Performance: tools and methodology
- Best practices using Xen
M3 Care and Feeding of Hadoop Clusters
Marco Nicosia, Yahoo! Grid Operations Team
Who should attend: Engineers and system administrators interested in evaluating the
operational aspects of Hadoop and those who are already charged with the
installation and upkeep of medium to large Hadoop clusters. No
previous experience with Hadoop is required.
This class will take an in-depth look at the operation
of Hadoop clusters, focusing on practical procedures. Although not hands-on, the presentation material will focus on the specific command lines
required. Demonstrations will be presented where appropriate.
Take back to work: Confidence in your ability to
safely and efficiently operate a Hadoop cluster.
- Planning and designing a Hadoop deployment using anywhere from four to 4,000 computers
- The functional underpinnings of Hadoop and how user
code is automatically executed across the computers in a Hadoop
- How to consult with engineering teams on the proper way to write and
deploy programs on either dedicated or shared Hadoop clusters
- Downloading, configuring, and distributing the Hadoop software
- Starting, stopping, and monitoring the status of both the Hadoop Distributed
File System and Map-Reduce components
- How to perform periodic maintenance to ensure the overall health of the
HDFS system, especially with respect to data integrity
- Configuring and managing the Map-Reduce job scheduler and user queues
- The correct series of steps to safely upgrade
the Hadoop software to a newer release, as well as how to safely
back out from such an upgrade (and understand the costs of such a
- Adding large amounts of data to the HDFS, as well as adding or removing
machines from the cluster (and seamlessly migrating to an
entirely different bank of computers!)
- Moving large data between HDFS instances
- Writing simple Hadoop programs in shell script and PIG for
system administration data analysis
M4 Administering Linux in Production Environments
Theodore Ts'o, IBM/Linux Foundation
Who should attend: Both current Linux system administrators and administrators from sites considering converting to Linux or adding Linux systems to their current computing resources.
This course discusses using Linux as a production- level operating system. Linux is used on the front line for mission-critical applications in major corporations and institutions, and mastery of this operating system is now becoming a major asset to system administrators.
Linux system administrators in production environments face many challenges: the inevitable skepticism about whether an open source operating system will perform as required; how well Linux systems will integrate with existing computing facilities; how to locate, install, and manage high-end features which the standard distributions may lack; and many more. Sometimes the hardest part of ensuring that the system meets production requirements is matching the best solution with the particular local need. This course is designed to give you a broad knowledge of production-worthy Linux capabilities, as well as where Linux currently falls short. The material in the course is all based on extensive experience with production systems.
This course will cover configuring and managing Linux computer systems in production environments. We will be focusing on the administrative issues that arise when Linux systems are deployed to address a variety of real-world tasks and problems arising from both commercial and research and development contexts. This course is designed for both current Linux system administrators and for administrators from sites considering converting to Linux or adding Linux systems to their current computing resources.
Take back to work: The knowledge necessary to add reliability and availability to your systems and to assess and implement tools needed for production-quality Linux systems.
- Recent kernel developments
- High-performance I/O
- Advanced file systems and the LVM
- Disk striping
- Optimizing I/O performance
- Advanced compute-server environments
- HPC with Beowulf
- Clustering and high availability
- Parallelization environments/facilities
- CPU performance optimization
- Enterprise-wide security features, including centralized authentication
- Automation techniques and facilities
- Linux performance tuning
Monday Morning Half-Day Tutorials
M5 Introduction to Python Concurrency
David Beazley, Dabeaz LLC
Who should attend: Python programmers who would like to know more about concurrent
programming idioms and library modules. Attendees should be familiar
with core Python datatypes (lists, dictionaries, etc.), functions,
classes, and commonly used modules in the standard library.
Even though Python is a high-level interpreted language, it is often
used to write applications that involve a high degree of concurrency
(for example, network servers managing thousands of clients).
Programmers working on such applications are often attracted to Python
because of its ease of programming as well as the large number of useful
library modules related to systems programming, networking, and
threads. However, a cruel irony is the fact that the Python
interpreter itself is only single-threaded—protected by a global lock
that makes it impossible for multi-threaded Python programs to scale
beyond one CPU core. Needless to say, this limitation impacts the way
developers address concurrent programming problems, especially
in programs using threads.
In this tutorial, we'll take a tour of how Python supports concurrent
programming. Topics will include traditional subjects such as threads
and message passing, along with more advanced topics such as
co-routines, cooperative multitasking, and asynchronous I/O.
Take back to work: A deeper understanding of how Python (and dynamic languages more
generally) is tackling concurrent programming problems. Python
programmers will get ideas on some of the techniques that might be
used to have programs take advantage of multiple CPU cores or operate
on a cluster.
- The Python interpreter execution model
- Understanding the global interpreter lock
- Thread programming
- Subprocesses and co-processes
- The multiprocessing library
- Data serialization
- Message passing
- Co-routines and cooperative multitasking
- Asynchronous I/O and event-driven programming
M6 Security Without Firewalls
Abe Singer, California Institute of Technology
Who should attend: Administrators who want or need to explore strong, low-cost, scalable security without firewalls.
Good, possibly better, network security can be achieved without relying on firewalls. The San Diego Supercomputer Center does not use firewalls, yet managed to go almost 4 years without an intrusion. Our approach defies some common beliefs, but it seems to work, and it scales well.
"Use a firewall" is the common mantra of much security documentation, and are the primary security "solution" in most networks. However, firewalls don't protect against activity by insiders, nor do firewalls provide protection against any activity that is allowed through the firewall. And, as is true for many academic institutions, firewalls just don't make sense in our environment. Weighting internal threats equally with external threats, SDSC has built an effective, scalable, host-based security model. The keys parts to our model are: centralized configuration management; regular and frequent patching; and strong authentication (no plaintext passwords). This model extends well to many environments beyond the academic.
Of course, we're not perfect, and we had a compromise as part of a security incident that spanned numerous institutions. However, firewalls would have done little if anything to have mitigated that attack, and we believe our approach to security reduced the scope of compromise and helped us to recover faster than some of our peers.
The key parts to that model are centralized configuration management, regular and frequent patching, and strong authentication (no plaintext passwords). This model extends well to many environments besides the academic.
In addition, our system administration costs scale well. The incremental cost of adding a host to our network (beyond the cost of the hardware) is negligible, as is the cost of reinstalling a host.
Take back to work: How to build effective, scalable, host-based security without firewalls.
- The threat perspective from a data-centric point of view
- How to implement and maintain centralized configuration
management using cfengine, and how to build reference systems
for fast and consistent (re)installation of hosts
- Secure configuration and management of core network services such as NFS, DNS, and SSH
- Good system administration practices
- Implementing strong authentication and eliminating use of
plaintext passwords for services such as
- A sound patching strategy
- An overview of the compromise, how we recovered, and what we learned
Monday Afternoon Half-Day Tutorials
M7 Python Generator Hacking
David Beazley, Dabeaz LLC
Who should attend: Intermediate to advanced Python programmers who would like to know
more about practical uses of generator functions and generator
expressions. Attendees should be familiar with core Python datatypes
(lists, dictionaries, etc.), functions, classes, and commonly used
modules in the standard library. No previous experience with
generators is assumed.
Generators and generator expressions are among the most useful
features of Python. Yet many Python programmers are unsure how to
apply them to real-world problems, because examples tend
to focus on utterly useless tasks such as generating Fibonacci
numbers. This tutorial presents practical uses of generators, including
processing large data files, handling real-time data sequences,
parsing, threads, networking, and distributed computing.
This tutorial will completely change the way you look at Python in general
and at generators in particular. Upon completion, you'll probably want
to go home and rewrite all of your code.
Take back to work: A new understanding of how Python generators are an extremely powerful
and elegant solution to a wide variety of problems you face
every day but have probably been solving the hard way.
- Basic concepts of Python iteration
- Generator functions and generator expressions
- Using generators to set up processing pipelines (just like UNIX pipes, but better)
- Processing huge data files
- Processing real-time data streams and logs
- Generators and threads
- Generators and distributed computing
- Advanced generators and co-routines
M8 Building a Logging Infrastructure and Log Analysis for Security
Abe Singer, California Institute of Technology
Who should attend: System, network, and security administrators who want to be able to separate the wheat of warning information from the chaff of normal activity in their log files.
This tutorial will show the importance of log files for maintaining
system security and general well-being, some strategies for building
a centralized logging infrastructure, explain some of the types of
information that can be obtained for both real-time monitoring and
forensics, and techniques for analyzing log data to obtain useful
All the devices on medium sized network can generate millions of lines
of log messages a day. Although much of the information is normal activity,
hidden within that data can be the first signs of an intrusion, denial of
service, worms/viruses, and system failures.
Take back to work: How to get a handle on your log files, which can help you run your systems and networks more effectively and can provide forensic information for post-incident investigation.
- Problems, issues, and scale of handing log information
- Generating useful log information: improving the quality of your logs
- Collecting log information: syslog and friends, building a log host, integrating Microsoft Windows into a UNIX log architecture
- Storing log information: centralized log architectures and log file archiving
- Log analysis: Log file parsing tools, data analysis of log files (e.g., baselining), attack signatures, and other interesting things to look for in your logs
- How to handle and preserve log files for human resources issues and legal matters
Tuesday, June 16, 2009
Tuesday Full-Day Tutorials
T2 Virtualization with VMware ESX 3i for UNIX Administrators:
Dan Anderson, VMware
Who should attend: System administrators and architects who are interested in deploying a VMware Virtual Infrastructure, including ESX Server and VirtualCenter, in a production environment. No prior experience with VMware
products is required. Knowledge of Linux is helpful; basic knowledge
of SANs is useful but not required.
VMware Infrastructure is the new computing platform from VMWare.
It helps organizations solve a range of computing challenges. This
workshop will provide an overview of VMware Infrastructure by
focusing on ESXi 3.5 and VirtualCenter. ESXi 3.5 has only a 32MB
footprint and runs independent of a general purpose operating system.
discuss the Remote Command Line Interface (RCLI), which will be the
primary command line tool to manage an ESXi 3.5 system. Additionally,
we will provide an overview of VMI (Virtual Machine Interface), a
guest OS communication interface with the hypervisor.
Take back to work: An understanding of ESXi 3.5 and VirtualCenter installation, configuration, and basic design architectures around networking and storage.
- Virtualization overview
- ESX 3i Installation and Configuration
- Networking overview and configuring vSwitches
- Storage overview and configuring datastores
- RCLI for the UNIX administrator
- VMI 101
- Virtual machines, virtual appliances, and the OVF
- Clusters, Resource Pools and VMware HA, VMware DRS
T3 Replacing Real Servers with Virtual Machines
Using Amazon Elastic Compute Cloud
and Simple Storage Service (S3)
David J. Malan, Harvard
Who should attend: Instructors who want more control over their course's
infrastructure, who want to provide each of their students
with their own virtual machine, or who want to assign
projects with high computational or space needs; CTOs who want to scale their infrastructure within minutes
to meet unusual loads or who want to load-test their own
infrastructure by simulating unusual loads; and system administrators who want their own server or cluster
without yet another box under their desk.
Take back to work: How to do it, and whether it's the right thing for you to do.
- Spawning and managing Amazon EC2 instancesv
- Evaluating EC2's costs (in dollars and man-hours)
- Amazon's command-line utilities
- Using others' images
- Burning your own images for others to use
- Backing up your data to S3
- Spawning Fedora-, Ubuntu-, and Windows-based VMs
- Time-saving management tools
- Commercial add-ons: RightScale, etc.
- How to do it at no cost (for academic purposes)
Bring to class:
- A laptop with wireless access is required.
T4 Inside the Linux 2.6 Kernel
Theodore Ts'o, IBM/Linux Foundation
Who should attend: Application programmers, system administrators
interested in performance tuning their Linux systems, and kernel
developers. You should be somewhat familiar with C programming in
the UNIX environment, but no prior experience with the UNIX or Linux
kernel code is assumed.
The Linux kernel aims to achieve conformance with existing standards and compatibility with existing operating systems; however, it is not a reworking of existing UNIX kernel code. The Linux kernel was written from scratch to provide both standard and novel features, and it takes advantage of the best practice of existing UNIX kernel designs.
This class will primarily focus on the currently released version of the Linux 2.6 kernel, but it will also discuss how it has evolved from Linux 2.4 and earlier kernels. It will not delve into any detailed examination of the source code.
Take back to work: An overview and roadmap of the kernel's design and functionality: its structure, the basic features it provides, and the most important algorithms it employs.
How the kernel is organized (scheduler, virtual memory system, filesystem layers, device driver layers, networking stacks)
- The interface between each module and the rest of the kernel
- Kernel support functions and algorithms used by each module
- How modules provide for multiple implementations of similar functionality
Ground rules of kernel programming (races, deadlock conditions)
Implementation and properties of the most important algorithms
- Comparison between Linux and UNIX kernels, with emphasis on differences in algorithms
- Details of the Linux scheduler
- The virtual memory subsystem
- Linux's virtual file system layer
- A quick tour through Linux's networking stack
T6 Performance Tools, Metrics, and Tuning for Solaris/Linux
Adrian Cockcroft, Netflix, Inc.
Who should attend: Capacity planning engineers and sysadmins with an interest in performance optimization and who work with Solaris or Linux.
Capacity planning and performance management tools have been
commercially available for many years. A new generation of freely
available tools provides data collectors and analysis packages. As
the underlying computer platforms and network devices have evolved,
they have added improved data sources and have bundled free data
collectors. Several open source and freeware projects have sprung
up to collect and display cross-platform data, and with the advent
of highly functional free statistics and modeling packages comprehensive
analysis, modeling and archival storage can now be assembled. Free
and bundled tools are of special interest to sites with very diverse
mixes of systems, very large sites where licensing costs become
prohibitive, and sites replacing a few large single systems with
many more low cost horizontally scaled systems.
The morning session provides a vendor- and operating
system-independent introduction to capacity planning techniques and
The afternoon session will focus on the measurement sources and tuning
parameters available in Solaris and Linux. The meaning and behavior of metrics is covered
Take back to work: A vendor- and OS-independent understanding of capacity planning techniques and tools, an understanding of the meaning and behavior of metrics, and knowledge of the common fallacies, misleading indicators, sources of measurement error, and other traps for the unwary.
- Computer system and network performance data collection, analysis, modeling, and capacity planning on any platform using bundled utilities and freely available tools such as Orca, Big Brother, OpenNMS, Nagios, Ganglia, SE Toolkit, R, Ethereal/Wireshark, Ntop, MySQL and PDQ
- TCP/IP measurement and tuning
- Complex storage subsystems
- Advanced Solaris metrics
Tuesday Morning Half-Day Tutorial
T7 Disk-to-Disk Backup and Eliminating Backup System Bottlenecks
UPDATED FOR 2009!
Jacob Farmer, Cambridge Computer Services
Who should attend: System administrators involved in the design and management of backup systems and policymakers responsible for protecting their organization's data. A general familiarity with server and storage hardware is assumed. The class focuses on architectures and core technologies and is relevant regardless of what backup hardware and software you currently use.
The data protection industry is going through a mini-renaissance. In the past few years, the cost of disk media has dropped to the point where it is practical to use disk arrays in backup systems, thus minimizing and sometimes eliminating the need for tape. In the first incarnations of disk-to-disk backup—disk staging and virtual tape libraries—disk has been used as a direct replacement for tape media. While this compensates for the mechanical shortcomings of tape drives, it fails to address other critical bottlenecks in the backup system, and thus many disk-to-disk backup projects fall short of expectations. Meanwhile, many early adopters of disk-to-disk backup are discovering that the longterm costs of disk staging and virtual tape libraries are prohibitive.
The good news is that the next generation of disk-enabled data protection solutions has reached a level of maturity where they can assist—and sometimes even replace—conventional enterprise backup systems. These new D2D solutions leverage the random access properties of disk devices to use capacity much more efficiently and to obviate many of the hidden backup-system bottlenecks that are not addressed by first-generation solutions. The challenge to the backup system architect is to cut through the industry hype, sort out all of these new technologies, and figure out how to integrate them into an existing backup system.
This tutorial identifies the major bottlenecks in conventional backup systems and explains how to address them. The emphasis is placed on the various roles for inexpensive disk in your data protection strategy; however, attention is given to SAN-enabled backup, the current state and future of tape drives, and iSCSI.
Take back to work: Ideas for immediate, effective, inexpensive improvements to your backup systems.
- Identifying and eliminating backup system bottlenecks
- Conventional disk staging
- Virtual tape libraries
- Removable disk media
- Incremental forever and synthetic full backup strategies
- Block- and object-level incremental backups
- Information lifecycle management and nearline archiving
- Data replication
- CDP (Continuous Data Protection)
- Current and future tape drives
- Capacity Optimization (Single-Instance File Systems)
- Minimizing and even eliminating tape drives
Tuesday Afternoon Half-Day Tutorial
T8 Next-Generation Storage Networking
UPDATED FOR 2009!
Jacob Farmer, Cambridge Computer Services
Who should attend: Sysadmins running day-to-day operations and those who set or enforce budgets. This tutorial is technical in nature, but it does not address command-line syntax or the operation of specific products or technologies. Rather, the focus is on general architectures and various approaches to scaling in both performance and capacity. Since storage networking technologies tend to be costly, there is some discussion of the relative cost of different technologies and of strategies for managing cost and achieving results on a limited budget.
There has been tremendous innovation in the data storage industry over the past few years. Proprietary, monolithic SAN and NAS solutions are beginning to give way to open-system solutions and distributed architectures. Traditional storage interfaces such as parallel SCSI and Fibre Channel are being challenged by iSCSI (SCSI over TCP/IP), SATA (serial ATA), SAS (serial attached SCSI), and even Infiniband. New filesystem designs and alternatives to NFS and CIFS are enabling high-performance filesharing measured in gigabytes (yes, "bytes," not "bits") per second. New spindle management techniques are enabling higher-performance and lower-cost disk storage. Meanwhile, a whole new set of efficiency technologies are allowing storage protocols to flow over the WAN with unprecedented performance. This tutorial is a survey of the latest storage networking technologies, with commentary on where and when these technologies are most suitably deployed.
Take back to work: An understanding of general architectures, various approaches to scaling in both performance and capacity, relative costs of different technologies, and strategies for achieving results on a limited budget.
- Fundamentals of storage virtualization: the storage I/O path
- Shortcomings of conventional SAN and NAS architectures
- In-band and out-of-band virtualization architectures
- The latest storage interfaces: SATA (serial ATA), SAS (serial attached SCSI), 4Gb Fibre Channel, Infiniband, iSCSI
- Content-Addressable Storage (CAS)
- Information Life Cycle Management (ILM) and Hierarchical Storage Management (HSM)
- The convergence of SAN and NAS
- High-performance file sharing
- Parallel file systems
- SAN-enabled file systems
- Wide-area file systems (WAFS)