FAST '09 Tutorials

TUTORIAL PROGRAM

Overview | Tutorial Descriptions

Tuesday, February 24, 2009

Half-Day Morning Tutorials (9:00 a.m.–12:30 p.m.)

T1 Clustered and Parallel Storage System Technologies UPDATED!
Brent Welch and Marc Unangst, Panasas

Cluster-based parallel storage technologies are now capable of delivering performance scaling from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial has two main sections. The first section describes the architecture of clustered, parallel storage systems and then compares several open-source and commercial systems based on this framework, including Panasas, Lustre, GPFS, and PVFS2. In addition, we describe the Object Storage Device (OSD) and Parallel NFS (pNFS) standards. The second half of the tutorial is about performance, including what benchmarking tools are available, how to use them to evaluate a storage system correctly, and how to optimize application I/O patterns to exploit the strengths and weaknesses of clustered, parallel storage systems.

Brent Welch is Director of Software Architecture at Panasas. Panasas has developed a scalable, high-performance, object-based distributed file system that is used in a variety of HPC environments, including many of the Top500 super computers. He has previously worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his PhD at UC Berkeley, he designed and built the Sprite distributed file system. Brent participates in the IETF NFSv4 working group and is co-author of the pNFS Internet drafts that specify parallel I/O extensions for NFSv4.1.

Marc Unangst is a Software Architect at Panasas, where he has been a leading contributor to the design and implementation of the PanFS distributed file system. He represents Panasas on the SPEC SFS benchmark committee and authored draft specification documents for the POSIX High End Computing Extensions Working Group (HECEWG). Previously, Marc was a staff programmer in the Parallel Data Lab at Carnegie Mellon, where he worked on the Network-Attached Storage Device (NASD) project. He holds a Bachelors of Science in Electrical & Computer Engineering from Carnegie Mellon.

T2 Security and Usability: What Do We Know? NEW!
Simson Garfinkel, Naval Postgraduate School

For years we've heard that security and usability are antagonistic: secure systems aren't usable and usable systems aren't secure. New research in the field of HCI-SEC reveals this myth for what it is. In this tutorial we will review the past few years of research in security and usability and see how to create systems that are both usable and secure. We'll also discuss how to evaluate the usability of a system in the lab, in the field, and with the necessary legal approvals.

Simson L. Garfinkel is an Associate Professor at the Naval Postgraduate School in Monterey, CA, and a fellow at the Center for Research on Computation and Society at Harvard University. He is also the founder of Sandstorm Enterprises, a computer security firm that develops advanced computer forensic tools used by businesses and governments to audit their systems. Garfinkel has research interests in computer forensics, the emerging field of usability and security, information policy, and terrorism. He has actively researched and published in these areas for more than two decades. He writes a monthly column for CSO Magazine, for which he has been awarded four national journalism awards, and is the author or co-author of fourteen books on computing. He is perhaps best known for Database Nation: The Death of Privacy in the 21st Century and for Practical UNIX and Internet Security.

Half-Day Afternoon Tutorials (1:30 p.m.–5:00 p.m.)

T3 Storage Class Memory, Technology, and Uses UPDATED!
Richard Freitas, Winfried Wilcke, Bülent Kurdi, and Geoffrey Burr, IBM Almaden Research Center

The dream of replacing the disk drive with solid-state, non-volatile random access memory is finally becoming a reality. There are several technologies under active research and development, such as advanced forms of FLASH, Phase Change Memory, and Magnetic RAM. They are collectively called Storage Class Memory (SCM). The advent of this technology likely will have a significant impact on the design of both future storage and memory systems.

This tutorial will give a rather detailed overview of the SCM device technologies being developed and how they will impact the design of storage controllers and storage systems. The device overview will emphasize technology paths to very high bit densities, which will enable low cost storage devices, ultimately becoming cost competitive with enterprise disks. The system discussion will include examples of very high I/O rate systems built with solid state storage devices.

But there is more to SCM than just its use in storage systems. SCM, by definition, is fast enough to be used as (non-volatile) main memory, complementing DRAM; we will lightly touch on how this will affect the overall system architecture and software.

In conclusion, we believe that SCM will have a major impact on the overall memory/storage stack of future systems and will, eventually, affect software as well.

Dr. Freitas is an IBM Research Staff Member at the IBM Almaden Research Center. He received his Ph.D. degree in EECS from the University of California at Berkeley in 1976. He then joined the IBM RISC computing group at the IBM Thomas J. Watson Research Center, where he worked on the IBM 801 project. He has held various management and research positions in architecture and design for storage systems, servers, workstations, and speech recognition hardware at the IBM Almaden Research Center and the IBM T.J. Watson Research Center. His current interests include exploring the use of emerging nonvolatile solid-state memory technology in storage systems for commercial and scientific computing.

Dr. Wilcke is Program Director at the IBM Almaden Research Center. He received a Ph.D. degree in nuclear physics in 1976 from the Johann Wolfgang Goethe Universität, Frankfurt, Germany, and worked at the University of Rochester, Lawrence Berkeley Laboratory, and Los Alamos on heavy-ion and muon-induced reactions. In 1983, he joined the IBM T.J. Watson Research Center in New York, where he managed Victor and Vulcan, the first two MIMD message-passing supercomputer projects of IBM Research, which were the precursors of the very successful IBM SP* supercomputers. In 1991 he joined HaL Computer Systems, initially as Director of Architecture and later as CTO. With Sun Microsystems, his team created the 64-bit SPARC** architecture. Later, he rejoined IBM Research in San Jose, California, where he launched the IBM IceCube project, which became the first funded spinoff venture of IBM Research. Recently, Dr. Wilcke became engaged in research on storage-class memories and future systems based on such memories. In addition to his industrial work, he has published more than 100 papers, has coauthored numerous patents, and is active in aviation.

Dr. Kurdi completed his Ph.D. studies at the Institute of Optics at the University of Rochester, where he investigated silicon-based integrated optics. He holds B.S. degrees in electrical engineering and mathematics with minors in physics and philosophy from the University of Dayton. In 1989 he joined the IBM Almaden Research Center, where he has worked on integrated optical devices for magneto-optical data storage, top surface imaging techniques for the fabrication of advanced magnetic write heads, and planarization processes for magnetic head slider fabrication. He is currently the manager of the nanoscale device integration group and has been coordinating several multifaceted efforts in the area of ultra-high-density NVM devices.

Geoffrey W. Burr received his B.S. in Electrical Engineering (EE) and B.A. in Greek Classics from the State University of New York at Buffalo. He received his M.S. and Ph.D. in Electrical Engineering from the California Institute of Technology. Since that time, Dr. Burr has worked at the IBM Almaden Research Center, where he is currently a Research Staff Member. After many years as an experimentalist in volume holographic data storage and optical information processing, Dr. Burr's current research interests include nanophotonics, computational lithography, numerical modeling for design optimization, phase change memory, and other non-volatile memory. He is currently a Topical Editor for Optics Letters.

T4 Web-Scale Data Management NEW!
Christopher Olston and Benjamin Reed, Yahoo! Research

A new breed of software systems is being developed to manage and process Web-scale data sets on large clusters of commodity computers. A typical software stack includes a distributed file system (e.g., GFS, HDFS), a scalable data-parallel workflow system (e.g., Map-reduce, Dryad), and a declarative scripting language (e.g., Pig Latin, Hive). These technologies are driven primarily by the needs of large Internet companies like Google, Microsoft, and Yahoo!, but are also finding applications in the sciences, journalism, and other domains.

In this tutorial we survey Web-scale data management technologies, with special focus on open-source instances. We give concrete code examples modeled after real-world use cases at companies like Yahoo!. These technologies have not yet reached maturity; at the end of the tutorial, we discuss some "in-the-works" and "wish-list" features in this space.

Christopher Olston is a senior research scientist at Yahoo! Research, working in the areas of data management and Web search. Olston is occasionally seen behaving as a professor, having taught undergrad and grad courses at Berkeley, Carnegie Mellon, and Stanford. He received his Ph.D. in 2003 from Stanford under fellowships from the university and the National Science Foundation.

Benjamin Reed works on distributed computing platforms at Yahoo! Research, where he is a research scientist. His projects include Pig and ZooKeeper, which are both Apache sub-projects of Hadoop. In the past he has contributed to the Linux kernel and was made an OSGI Fellow for his work on the OSGI Java framework. He received his PhD in 2000 from the University of California, Santa Cruz.

Need help? Use our Contacts page.

Last changed: 17 Feb. 2009 mn