FAST '19 Training Program

Monday, February 25, 2019

Half-Day Morning Session

Morning Tutorial 1: Understanding Large Scale Storage Systems

Monday, 9:00 am–12:30 pm

Constitutional Ballroom A

Brent Welch, Google

Available Media

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices including traditional hard drives, SSD, and new non-volatile memory devices. Next, we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include:

SSD technology
NVRAM
Scaling the data path
Scaling metadata
Fault tolerance
Manageability
Cloud storage

Brent Welch is a senior staff software engineer at Google, where he works on their public cloud system. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver level up through operating systems, network services, user applications, and graphical user interfaces. While getting his Ph.D. at UC Berkeley, Brent designed and built the Sprite distributed file system. While at Panasas he helped build the PanFS cluster file system. He is the creator of the TclHttpd web server, the exmh email user interface, and the author of Practical Programming in Tcl and Tk.

Morning Tutorial 2: Blockchain and Storage

Monday, 9:00 am–12:30 pm

Constitutional Ballroom B

Mike Ault, IBM

Available Media

This tutorial will cover the basics of blockchain, this issues blockchain has concerning storage, database usage with blockchain and the solutions to the storage issues.

Topics include:

The Basics of Blockchain
Storage Issues for Blockchain
Using Databases for offchain storage
Blockchain deployment & Backup/Recovery

Mike Ault began work in the nuclear navy and moved into the civilian nuclear field in 1979. He has been working with computers and databases since 1980. In 1990 Mike started working with the Oracle database system. Mike has worked with flash systems since 2007 when he began consulting for TMS on use of Oracle with flash. In 2009, Mike joined TMS as their Oracle and FlashSystem (Ramsan) evangelist. When IBM Purchased TMS in 2012, Mike came along and has been working with FlashSystem at IBM ever since, first with Oracle, then financial systems and now with Blockchain. Mike has written or co-written 26 Oracle- and flash-related books and is the author of many articles, whitepapers and presentations. Mike is a frequent presenter at Oracle and IBM conferences and was the winner of a best tutorial award from SNIA.

Half-Day Afternoon Session

Afternoon Tutorial 1: Advanced Persistent Memory Programming

Monday, 1:30 pm–5:00 pm

Constitution Ballroom A

Andy Rudoff, Intel, and Tom Talpey, Microsoft

Available Media

Persistent Memory (“PM”) support is becoming ubiquitous in today’s operating systems and computing platforms. From Windows to Linux to open source, and from NVDIMM, PCI Express, storage-attached and network-attached interconnect access, it is available broadly across the industry. Its byte-addressability and ultra-low latency, combined with its durability, promise a revolution in storage and applications as they evolve to take advantage of these new platform capabilities.

The tutorial explores the concepts and today’s programming methodologies for PM, including the SNIA NonVolatile Memory Programming Model architecture, open source and native APIs, operating system support for PM such as direct access filesystems, and via language and compiler approaches. The software PM landscape is already rich and growing.

Additionally, the tutorial will explore the considerations when PM access is extended across fabrics such as networks, I/O interconnects, and other non-local access. While the programming paradigms remain common, the implications on latency, protocols, and especially error recovery are critically important to both performance and correctness. Understanding these requirements are of interest to both the system and application developer or designer.

Specific programming examples, fully functional on today’s systems, will be shown and analyzed. Concepts for moving new applications and storage paradigms to PM will be motivated and explored. Application developers, system software developers, and network system designers will all benefit. Anyone interested in an in-depth introduction to PM in emerging software and hardware systems can also expect an illuminating and thought-provoking experience.

Topics include:

Persistent Memory
Persistent Memory Technologies
Remote Persistent Memory
Programming Interfaces
Operating Systems
Open Source Libraries
RDMA

Andy Rudoff is a Principal Engineer at Intel Corporation, focusing on Non-Volatile Memory programming. He is a contributor to the SNIA NVM Programming Technical Work Group. His more than 30 years industry experience includes design and development work in operating systems, file systems, networking, and fault management at companies large and small, including Sun Microsystems and VMware. Andy has taught various Operating Systems classes over the years and is a co-author of the popular UNIX Network Programming text book.

Tom Talpey is an Architect in the Networking team at Microsoft Corporation in the Windows Devices Group. His current areas of focus include RDMA networking, remote filesharing, and persistent memory. He is especially active in bringing all three together into a new ultra-low-latency remote storage solution, merging the groundbreaking advancements in network and storage-class memory latency. He has over 30 years of industry experience in operating systems, network stacks, network filesystems, RDMA and storage, and is a longtime presenter and instructor at diverse industry events.

Afternoon Tutorial 2: Caches in the Modern Memory Hierarchy with Persistent Memory and Flash

Monday, 1:30 pm–5:00 pm

Constitution Ballroom B

Irfan Ahmad, CachePhysics, and Ymir Vigfusson, Emory University

Available Media

For a very long time, practical scaling of every level in the computing hierarchy has required innovation and improvement in caches. This is as true for CPUs as it is for storage and networked, distributed systems. As such, research into cache efficiency and efficacy improvements has been highly motivated and continues with strong improvements to this day. However, there are certain areas in cache algorithms optimization that have only recently experienced breakthroughs.

In this tutorial, we will start by reviewing the history of the caching algorithm research and practice in industry. Of particular interest to us are multi-tier memory hierarchies that are getting more complex and deep due to hardware innovations. These hierarchies motivate revisiting multi-tier algorithms. We will then review key tools in the research or and management called cache utility curves and recent literature that has made them easier to compute. Using this tool, we will excavate around caching policies and their trade-offs. We will also spend some time thinking about optimality for caches in modern memory hierarchies with DRAM, non-volatile/persistent memory and flash.

Topics include:

Overview and history of the caching algorithm research and practice in industry
Introduction to new challenges posed by multi-tier memory hierarchies
Review of Cache utility curves and recent literature
Experimenting with caching policies for production uses cases
How to find the optimal cache

Irfan Ahmad is the CEO and Cofounder of CachePhysics. Previously, he served as the CTO of CloudPhysics, pioneer in SaaS Virtualized IT Operations Management, which he cofounded in 2011. Irfan was at VMware for nine years, where he was R&D tech lead for the DRS team and co-inventor for flagship products including Storage DRS and Storage I/O Control. Before VMware, Irfan worked on the Crusoe software microprocessor at Transmeta.

Irfan is an inventor on more than 35 patents. He has published at ACM SOCC, FAST, USENIX ATC, and IEEE IISWC, including two Best Paper Awards. Irfan has chaired HotStorage, HotCloud and VMware’s R&D Innovation Conference. He serves on steering committees for HotStorage, HotCloud, and HotEdge. Irfan has served on program committees for USENIX ATC, FAST, MSST, HotCloud, and HotStorage, among others, and as a reviewer for the ACM Transactions on Storage.

Ymir Vigfusson is Assistant Professor of Mathematics and Computer Science at Emory University since 2014, Adjunct Assistant Professor at the School of Computer Science at Reykjavik University since 2011, and a co-founder and Chief Science Officer of the offensive security company Syndis since 2013. Ymir completed his Ph.D. in Computer Science at Cornell University in 2010 where his dissertation on "Affinity in Distributed Systems" was nominated for the ACM Doctoral Dissertation Award.

His primary research interests are on distributed systems and caching, having worked on cache replacement in the IBM Websphere eXtreme Scale at IBM Research (2009–2011), and more recently as part of his NSF CAREER program on "Rethinking the Cache Abstraction." He has published at conferences that include ACM SOCC, USENIX ATC, VLDB, and EuroSys, as well as ACM TOCS. Ymir serves on the steering committee of LADIS (2010–2018), has been on program committees for ACM SOCC, ICDCS, EuroSys, and P2P. In addition to caching, Ymir also works on improving epidemiological surveillance and information security, funded by the Center for Disease Control and grants from the Icelandic Center for Research.

Continuing Education Units (CEUs)

USENIX provides Continuing Education Units for a small additional administrative fee. The CEU is a nationally recognized standard unit of measure for continuing education and training and is used by thousands of organizations.

Two half-day tutorials qualify for 0.6 CEUs. You can request CEU credit by completing the CEU section on the registration form. USENIX provides a certificate for each attendee taking a tutorial for CEU credit. CEUs are not the same as college credits. Consult your employer or school to determine their applicability.