Help Promote graphics!
You are here
Full Training Program
Half Day Morning
Eno Thereska is a Researcher at Microsoft Research in Cambridge, UK. He has broad interests in computer systems. He has over 30 academic publications in top conferences in the field of storage systems and operating systems, including FAST, OSDI, SOSP, SIGMETRICS and CHI. He served as technical co-chair of the File and Storage Systems Conference (FAST '14). Eno is a recipient of the 2014 IEEE William R. Bennett Prize, recipient of the IEEE Infocomm 2011 Best Paper award, and recipient of the USENIX FAST Best Paper and Best Student Paper awards in 2005 and 2004 respectively. He graduated with a Ph.D. from Carnegie Mellon University in 2007.
Greg O'Shea is a software engineer in the Systems and Networking group at Microsoft Research, Cambridge, UK. He has worked extensively in developing and evaluating experimental network and storage systems and has published his findings extensively in SIGCOMM, NSDI, MobiCom, OSDI, and SOSP. Greg’s work has been incorporated into several Microsoft products such as Windows, Hyper-V, and Windows Server. His latest work is on Storage Quality of Service and is included in Windows Server Technical Preview. He has also developed the Microsoft Research Storage Toolkit, a development kit for software-defined storage. Greg has a Ph.D. from London University.
Grand Ballroom B
This tutorial will provide technical background on the (often-vague) concept of software-defined storage (SDS). The technical contribution of this tutorial is around a definition of SDS that builds on recent work in network systems and applies it to storage. This work includes basic concepts such as classification, routing and forwarding, and the separation of control, and data planes. Surprisingly, these basic concepts do not apply well to the storage stack today, making it difficult to enforce end-to-end storage policies.
There will be a short, hands-on exercise that requires Windows 8.1.
Download the presentation slides (PPTX) for this tutorial.
Brent Welch is a senior staff software engineer at Google. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through network servers, user applications, and graphical user interfaces. While getting his Ph.D. at the University of California, Berkeley, Brent designed and built the Sprite distributed file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.
Grand Ballroom C
This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.
Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.
The tutorial starts with a look at storage devices and SSDs in particular, which are growing in importance in all storage systems. Next we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.
Topics include SSD technology, scaling the data path, scaling metadata, fault tolerance, manageability, and cloud storage. Specific systems are discussed, including Lustre, GPFS, PanFS, HDFS (Hadoop File System), OpenStack, and the NFSv4.1 standard for parallel I/O.
Half Day Afternoon
Sam H.(Hyuk) Noh received his B.S. in Computer Engineering from the Seoul National University in 1986, and his Ph.D. from the Department of Computer Science, University of Maryland, College Park in 1993. He has been a professor at the School of Computer and Information Engineering at Hongik University in Seoul, Korea since 1994. He has worked on various software issues pertaining to flash memory since 1999, having authored numerous papers and holding numerous patents in that area. He has served as General Chair, Program Chair, and Program Committee Member for a number of technical conferences and workshops including the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), IEEE International Conference on Parallel and Distributed Systems (ICPADS), USENIX Conference on File and Storage Technologies (FAST), and International World Wide Web (WWW) Conference. He also serves as Associate Editor of ACM Transactions on Storage. His other current research interests include operating system issues pertaining to non-volatile memory, such as PCM and STT-MRAM.
Dr. Yang-Suk Kee (Yang Seok Ki) is a director and architect of Memory Solutions Lab, Samsung Semiconductor Inc. America. He leads the Advanced Datacenter Solutions group whose main focus is to innovate the SSD ecosystem, and drives astorage-centric computing paradigm called Smart SSD. Before joining Samsung, he worked for Oracle server technology group that builds a distributed database server system, and contributed to Oracle 12c release. Prior to his industrial experience, he worked on HPDC (High Performance Distributed Computing), Grid, and Cloud research in Information Sciences Institute of University of Southern California and Center of Networked Systems, University of California, San Diego. He received his Ph.D. degree of Electrical Engineering and Computer Science in parallel processing, his M.S. degree of Computer Engineering, and B.S. degree of Computer Engineering from Seoul National University, Korea.
Grand Ballroom B
This tutorial will be a crash course on flash memory. We will cover the major ground related to flash memory-based products, starting from the intrinsic characteristics of flash memory devices, moving up to the FTL firmware that controls the flash memory devices, and then finally up to the system software layer that makes use of these flash memory-based end products. We start off covering the history and the very basics of each layer. We then discuss the recent trends that are happening in each of the layers. We will also discuss how each of the layers differ for the various flash products that are commercially available. We will also attempt to untangle the close-knit relationship among the system, software, interface, and the market that together results in the flash memory-based end products and the software systems that make use of these end products.
Grand Ballroom C
During the first half of the tutorial, we will provide an intro to Apache Hadoop and the ecosystem. In the second half, we will show, using an end-to-end application of clickstream analytics, how users can:
- Model data in Hadoop, select optimal storage formats for data stored in Hadoop
- Move data between Hadoop and external systems such as relational databases and logs
- Access and process data in Hadoop
- Orchestrate and scheduling workflows on Hadoop
Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered.
Students should bring laptops with a copy of the of the Cloudera Quickstart VM (or access to a working alternate VM or Hadoop cluster). The VM can be downloaded from here.
These are a 64-bit VMs. They requires a 64-bit host OS and a virtualization product that can support a 64-bit guest OS.
To use a VMware VM, you must use a player compatible with WorkStation 8.x or higher: Player 4.x or higher, ESXi 5.x or higher, or Fusion 4.x or higher. Older versions of WorkStation can be used to create a new VM using the same virtual disk (VMDK file), but some features in VMware Tools won't be available.
|CDH and Cloudera Manager Version||RAM Required by VM||File Size|
|CDH 5 and Cloudera Manager 5||4 GB||3 GB|
|CDH 4, Cloudera Impala, Cloudera Search, and Cloudera Manager 4||4 GB||2 GB|