You are here
Full Training Program
Half Day Morning
Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. As Division Leader, Gary is responsible for all aspects of High Performance Computing technologies and deployment at Los Alamos. Additionally, Gary is responsible for managing the R&D portfolio for keeping the new technology pipeline full to provide solutions to problems in the Lab’s HPC environment, through funding of university and industry partners.
Gary is also the US Department of Energy Exascale Storage, IO, and Data Management National Co-Coordinator. In this role, Gary helps managed the U.S. government investments in Data Management, Mass Storage, and IO. Gary has 30 active patents/applications in the data storage area and has been working in HPC and HPC related storage since 1984.
John Bent, currently of EMC, soon to be of Dell, formerly of Los Alamos National Lab, has been working on storage systems for over 20 years. After completely his data-aware scheduling dissertation at Wisconsin in 2005, John spent the next 10 years working for Gary designing, maintaining, and measuring some of the world's largest parallel storage systems. Now at EMC, John works in the Office of the CTO helping design and map EMC storage products to emerging workloads in both Enterprise and Extreme IO.
Some of John’s more influential research has been the Parallel Log-structured File System and the DOE sponsored FastForward project prototyping an exascale storage system with Intel and The HDF Group. John is a former anthropology major who spent two years spearfishing on the equator while working as a Peace Corps volunteer.
Mark Gary is a Deputy Division Leader for the Livermore Computing Division within Computations. In this role, Mark has responsibilities for the 24x7 operation of LLNL's world-class computing environment. Livermore Computing provides reliable high performance computers, infrastructure and services (networks, data archive, operations, file systems, system software, visualization, system administration, user assistance and consultation) in support of LLNL missions. Mark leads projects ranging from leading integrated LC planning efforts, to external collaborations in support of extreme scale computing and storage futures.
Mark has worked on all aspects of High Performance Computing at Livermore over the last 31 years. While the primary focus of his work has been on mass storage and parallel file systems, Mark has also worked on operating systems, driver, and kernel development. He is a co-author of HPSS and UniTree archival storage systems. Mark has co-managed successful government/industry collaborations over the last three decades and has led archival storage and Lustre file system development and operations teams.
Mark received his B.S. in Computer Science from the University of California, Santa Barbara, in 1984.
Nicholas Lewis is a Ph.D. candidate in the History of Science, Technology, and Medicine Program at the University of Minnesota, Twin Cities. He received a master's in history from the University of Utah in 2011, and has undergraduate degrees in history and anthropology from Weber State University. He worked in IT before joining the Charles Babbage Institute's NSF History of Computer Security Project as a graduate research assistant. He currently works as a GSRA on the History of Supercomputing Project, a collaborative effort between CBI and the High-Performance Computing Division at Los Alamos National Laboratory, where he is currently conducting dissertation research.
In this tutorial, we will introduce the audience to the lunatic fringe of extreme high-performance computing and its storage systems. The most difficult challenge in HPC storage is caused by millions (soon to be billions) of simultaneously writing threads. Although cloud providers handle workloads of comparable, or larger, aggregate scale, the HPC challenge is unique because the concurrent writers are modifying shared data.
We will begin with a brief history of HPC computing covering the previous few decades, bringing us into the petaflop era which started in 2009. Then we will discuss the unique computational science in HPC so that the audience can understand the unavoidability of its unique storage challenges. We will then move into a discussion of archival storage and the hardware and software technologies needed to store today’s exabytes of data forever. From archive we will move into the parallel file systems of today and will end the lecture portion of the tutorial with a discussion of anticipated HPC storage systems of tomorrow. Of particular focus will be namespaces handling concurrent modifications to billions of entries as this is what we believe will be the largest challenge in the exascale era.
The tutorial will end with a free-ranging audience directed panel.
- A brief history lesson about the past 30 years of supercomputers
- An understanding of what makes HPC computing unique and the entailing storage challenges
- An overview of current HPC storage technologies such as burst buffers, parallel file systems, and archival storage
- A glimpse into the future of HPC storage technologies for both hardware and software
- Insights into unique research opportunities to advance HPC storage
Brent Welch is a senior staff software engineer at Google, where he works on their public Cloud platform. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.
This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.
Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.
The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.
Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Ceph, Lustre, GPFS, PanFS, HDFS (Hadoop File System), and OpenStack.
Half Day Afternoon
Dean Hildebrand is a Research Staff Member and Master Inventor at the IBM Almaden Research Center and a recognized expert in scalable file systems and object stores. He has authored numerous scientific publications, created over 30 patents, and been the technical program chair and sat on the program committee of numerous conferences. Dr. Hildebrand pioneered pNFS, demonstrating the feasibility of providing standard and scalableaccess to any file system. He received a B.Sc. degree in computer science from the University of British Columbia in 1998 and M.S. and Ph.D. degreesin computer science from the University of Michigan in 2003 and 2007, respectively.
Bill Owen is a Senior Engineer with the IBM Spectrum™ Scale development team. He is responsible for the integration of OpenStack with Spectrum Scale, focusing on the Swift object, Cinder block, and Manila file storage components of OpenStack. He has worked in various development roles within IBM for over 15 years. Before joining IBM, Bill developed and deployed grid management systems for electric utilities. Bill holds B.Sc. and M.S. degrees in Electrical Engineering from New Mexico State University.
This tutorial will provide a technical overview of the latest distributed file and object access protocols. The goal is to provide administrators and developers with the knowledge to choose the best data access protocol for their new applications or determine if their existing file-based applications are good candidates for being ported to using an object access protocol.
For decades, distributed file systems such as NFS have been the sole method for applications to work with remote data. The emergence of mobile devices, tablets, and the Internet of Things, combined with the global demand for cloud storage, has given rise to numerous new object storage access protocols. While these new protocols are simpler in many ways, and offer several new features, they also come with their own set of access semantics that may cause problems for applications.
We will cover and contrast NFSv4/v4.1 with both the S3 and Swift object protocols, as well as discuss the challenges of providing both file and object access to a single dataset, including such topics as common identity, ACL, and quota management.
Jason Resch has 17 years of professional software engineering experienceand is presently a Software Architect at Cleversafe, Inc.—a company that pioneered applying Erasure Codes to Object Storage. In his nine years at Cleversafe, Jason specialized in developing new algorithms to improve Erasure Code performance and security and techniques for rebuilding Erasure Coded data. He has 133 issued and 310 pending patents as well an numerous technical conference presentations and published journal papers. Jason graduated from Illinois Institute of Technology in 2006 with a B.S. in Computer Science with a specialization in information security and minor in psychology. He recently was awarded an IIT outstanding Young Alumnus Award and is listed in Crain's Chicago Business Tech 50 list (2015).
W. David Schwaderer presently consults for Silicon Valley enterprises, many of them specializing in data storage technologies. As a multidisciplinary technologist, he has authored 11 technical books on a wide spectrum of topics ranging from data storage systems, data management, communication signaling, C Language programming, ASIC core interfacing, and Digital Image Processing. David has presented at IEEE and USENIX conferences, Stanford, MIT, Intel, Google, Sun/Oracle Labs, and across greater Silicon Valley. His four innovation Google TechTalks on YouTube have recorded over 40,400 views. David has a Masters Degree in Applied Mathematics from the California Institute of Technology and an MBA from the University of Southern California. At his recent Joint IEEE Comsoc-CEsoc SCV presentation titled "Broadcast Storage forVideo-Intensive Worlds", he was accorded the title "Silicon Valley Icon."
It's common knowledge that the volume of global data has exploded. Simultaneously, the challenge to store, protect, and access this data securely "at scale" has produced hyperscale hardware and software architectures that continue to subduct traditional enterprise datacenter systems. These new architectures will prove essential inresponding to the unrelenting global "data tsunami".
One important hyperscale data storage methodology is Object Storage. Object Storage often uses Erasure Coding as a means to reduce data loss probabilities while simultaneously economizing data storage capital costs. Erasure Coding's powerful principles are also found in numerous other data retention methodologies, including Information Dispersal Algorithm (IDA) deployments and Secret Sharing, a method of providing shared-data security.
Unfortunately, understanding Erasure Coding's deployment strategies and powerful foundations can quickly prove challenging, if not impossible, because Erasure Coding's simple principles are typically steeped in academic obfuscation. This has historically presented impenetrable obstacles to many engineers. Luckily, that's totally unnecessary.
The first part of this tutorial will provide a brief Object Storage and Erasure Coding introduction as a backdrop for a deep exploration of effective Erasure Coding deployment strategies, including performance and bandwidth tradeoff considerations. It will also introduce IDA and Secret Sharing and briefly discuss their relation to Erasure Coding.
After an intermission, the second part of the tutorial will provide a programming lab which exercises running Python 2.7 programs distributed on the FAST '16 Tutorial Sessions USB thumb drive. This lab should help cement Erasure Code principles and deployment considerations as well as provide demonstrations of their utility. As an example, the programs will illustrate Erasure Code operations using tables as well as on-the-fly calculations—useful in configurations where it is necessary to trade processing cycles for addressable memory.
This tutorial portion will conclude with an intense, but an extremely accessible, Erasure Coding principles discussion that will be of interest for attendees desiring a deeper understanding of how Erasure Codes achieve their results. This material will be devoid of impenetrable mathematical jargon typically prevalent in Erasure Code literature. The discussion progressively examines various Galois Finite Fields in detail, with a brief discussion of GF(2^16).
Finally, the tutorial will include discussion from the forthcoming book titled Exabyte Data Preservation, Postponing the Inevitable, co-authored by the speakers and Dr. Ethan Miller of University of California, Santa Cruz.
- Brief Object Storage Introduction
- Erasure Coding and Object Storage
- Erasure Coding Deployment Strategy and Tradeoff Considerations
- Information Dispersal Algorithm and Secret Sharing
- Understanding Galois Finite Fields
- Galois Finite Field Computations (made extremely accessible)
- Python 2.7 Galois Finite Field Computation Demonstration Programs
- Python 2.7 programming lab