Afternoon Tutorial 2: An Introduction to the Implementation of {ZFS}

Marshall Kirk McKusick; Benjamin Manes

Monday, February 24, 2020

Half-Day Morning Session

Morning Tutorial 1: Persistent Memory Programming on Conventional Hardware

Monday, 9:00 am–12:30 pm

Grand Ballroom D

Terence Kelly

Available Media

Often overlooked in the excitement surrounding novel byte-addressable non-volatile memory (NVM) hardware is a purely *software* abstraction of persistent memory that can be implemented on conventional hardware, without NVM. Persistent memory invites the same programming style, with similar advantages: Applications become dramatically simpler because they store persistent data in memory and manipulate it with CPU instructions, eliminating large, complex, opaque external persistent stores such as relational databases and key-value stores. Furthermore, persistent application data exist only in the in-memory format, eliminating the need for translation to/from a different persistent format. Finally, programmers think only in the paradigm of imperative algorithms for in-memory data structures; there's no need to mentally context switch to a different paradigm, e.g., declarative SQL manipulating relational data.

We begin with a thorough review of the one commonality underlying all persistent memory programming platforms and frameworks for C/C++ from both industry and academia: the art of laying out application data in file-backed memory mappings. We master a handful of essential techniques, idioms, and tricks that enable developers to confidently write efficient and correct software in the persistent memory style. We learn the tradeoffs between relocatable vs. fixed-address persistent data structures. We flag the pitfalls surrounding multi-threaded persistent memory programming and learn to circumvent them. We survey crash-tolerance mechanisms for the persistent memory style of programming, including those that require special NVM hardware (NVDIMMs or Optane) and also those that admit implementation on conventional hardware (volatile DRAM and block storage). We show how to write from scratch, in two dozen lines of straightforward C code, an efficient and portable crash-tolerance mechanism that works on both conventional hardware and NVM. We learn from extensive industry experience with successfully retrofitting crash-tolerance onto complex legacy production software that was not originally designed to survive crashes.

The tutorial will consider working C/C++ programs, showing how to evolve non-persistent data structures into persistent data structures, and then showing how to prevent crashes from corrupting them. At each stage in the progression, code deltas illustrate the steps that the developer must take and the factors that she must consider. Example code and explanatory documentation will be provided. Most importantly, students will learn how to write such software themselves.

Students are encouraged but not required to prepare by reading the following article:

"Persistent Memory Programming on Conventional Hardware"
ACM Queue magazine, Vol. 17, No. 4, July/August 2019

Terence Kelly

Terence Kelly studied Computer Science at Princeton and the University of Michigan. He was a researcher at HP Labs for 14 years, the last five of which devoted to software support for non-volatile memory. His research publications on persistent memory programming have appeared in ASPLOS, FAST, DISC, USENIX ATC, and EuroSys; his research publications on multi-threaded programming have appeared in OSDI and POPL. Kelly's persistent memory research led to several tech transfers, notably to HP Indigo printing presses and to the HP Advanced File System. His practitioner-oriented articles on persistent memory programming have appeared in ACM Queue and USENIX ;login:. Kelly now teaches and evangelizes the persistent memory style of programming. He has released three software packages related to persistent memory programming. Kelly's publications and patents are listed at http://ai.eecs.umich.edu/~tpkelly/papers/

Morning Tutorial 2: An introduction to NVMe Zoned Namespaces

Monday, 9:00 am–12:30 pm

Grand Ballroom E

Simon Lund and Klaus Jensen, Samsung

Available Media

Zoned Namespaces (ZNS) are bringing the first wave of Open-Channel SSD concepts to standardization in NVMe. While promising improvements in WAF, tail latencies, and cost, the fact that changes to the host software are needed is still a concern for broad adoption. In this workshop, we will cover the concepts behind ZNS and the extensions already under standardization in NVMe and then focus on the work being done in Linux to support ZNS—from the extensions to the existing zone block framework, all the way to target applications. More specifically, we will cover:

Linux kernel ecosystem: How does ZNS fit in the Linux Zoned block ecosystem? What are the options to boot a kernel with ZNS drives and the tools to manage them?
User-space libraries: With a number of emerging namespaces types and command sets in NVMe (e.g., ZNS, KV, Computational Storage), we need a library that allows us to abstract the details of each technology and allow for a generic programming model. Since NVMe has traditionally been block-based, such a library has never been needed. For this purpose, we are building xNVMe (Cross NVMe), which encompasses NVMe core functionality on a common API + namespace types extension (e.g., ZNS). It also allows to transparently use different transports host-device (i.e., libaio, io_uring, SPDK, other interfaces in FreeBSD). In the tutorial, we can show how to use such a library as well as how it integrates on well-known applications such as RocksDB. Note that all the work is open-source and will be upstreamed. We expect too to create and maintain packages for different distributions.
Emulation in QEMU: QEMU supports NVMe at the moment, but the bulk of the work was de-prioritized when devices became available. However, in our OCSSD days, we learned that having emulation when the interface is not as well-known as block is very useful. For this purpose, we have implemented a full 1.3 and 1.4 support as well as ZNS support in QEMU. All the work is being upstream (as TPs are ratified). In the tutorial, we can cover how to setup QEMU to emulate ZNS devices and how to debug bugs in real hardware by reproducing them in QEMU.

Learning Objectives

Understand the changes needed to support Zoned Namespaces in existing applications
Become familiar with the Linux framework for Zoned devices and understand which classes of applications can benefit from ZNS
Become familiar with the open-source support for ZNS available across the Linux stack

Simon Lund, Samsung

Simon Lund is a Staff Engineer at Samsung. His current work revolves around reducing the cognitive load for developers adopting emerging storage interfaces. Before Samsung, he worked at CNEX Labs designing and implementing liblightnvm: the Open-Channel SSD User Space Library. Simon received his Ph.D. on High Performance Backends for Array-Oriented Programming on Next-Generation Processing Units at the University of Copenhagen. He has given several talks on programming language, interpreter, and compiler design for HPC during his Ph.D. Most recently, in the industry at the SNIA Storage Developer Conference. Regardless of the topic, Simon's focus is the same, to bridge the gap between high-level abstractions and low-level control and measuring the cost and benefit of doing so.

Klaus Jensen, Samsung

Klaus Jensen is a Software Engineer with a background in academia. He has worked in the area of High Performance Computing, avoided users as an old school UNIX sysop, taken a stint in an IT consultancy, written a Ph.D. on tape and been involved in the OpenChannel SSD community. He now works on NVMe emulation and the NVMe software stack at Samsung Electronics.

Half-Day Afternoon Session

Afternoon Tutorial 1: Designing Modern Software Caches

Monday, 1:30 pm–5:00 pm

Grand Ballroom D

Roy Friedman, Technion—Israel Institute of Technology; Benjamin Manes, Vector

Available Media

Caching is one of the most basic and most effective mechanisms for boosting computing storage systems' performance.

In this tutorial we will survey recent developments in designing software cache libraries, while using Guava, Caffeine, and Ristretto as running examples. We will start by explaining some basic concepts in caching terminology and discuss why software caches are different than hardware designs and the challenges of designing an effective software cache library.

In particular, we will address issues like concurrency, memory management, timers handling, and cache management (admission and eviction). We will also address open research directions.

Specifically, the tutorial will include 4 sections:

An introduction to software caching: basic concepts, differences from hardware caches, principle of locality, significance of workload, other challenges—a total of 30 minutes + 5 minutes for questions.
General design concerns: effective concurrency, watermarks for evacuation, timers handling—60 minutes + 10 minute break at the end.
Cache management: A survey of modern cache management schemes (replacement and evacuations) including ARC, CAR, Hyperbolic, W-TinyLFU, FRD, and Mini-Sim—60 minutes + 10 minutes break at the end.
Open research directions: Including, e.g., entry cost (weight, latency), capacity allocations for multiple tenants, prediction of workload changes—30 minutes + 5 for questions.

Roy Friedman, Technion—Israel Institute of Technology

Roy Friedman is a professor in the Department of Computer Science at the Technion—Israel Institute of Technology. His research interests include Network Streaming Protocols, Caching, Replication, Fault-Tolerance, Dependability, High Availability, Consistency, and Mobile Computing. Roy Friedman serves as an associate editor for the IEEE TDSC and PC co-chair or OPODIS 2019. In the past, he served as PC co-chair for ACM DEBS 2015, ACM SYSTOR 2014 and Autonomics 2009 as well as vice-chair for IEEE ICDCS 2013+2006 and EuroPar 2008+2003, and fast abstract chair for IEEE DSN 2013. He has published more than 150 papers and holds 3 USA patents. Formerly, Roy Friedman was an academic specialist at INRIA (France) and a researcher at Cornell University (USA). He is a founder of PolyServe Inc. (acquired by HP) and holds a Ph.D. and a B.Sc. from the Technion.

Benjamin Manes, Vector

Ben Manes is CTO of Vector, a software company offering solutions for the trucking industry. Previously, when at Google, Ben co-authored Google Guava's Cache based on his successful ConcurrentLinkedHashMap library. In collaboration with Roy Friedman and his team at the Technion, Ben developed Caffeine cache. Like its predecessors, Caffeine has seen wide adoption in the Java ecosystem. He is currently advising the Ristretto team who aim to replicate Caffeine for the Go community. Ben holds an M.Sc. and two B.Sc. from the Illinois Institute of Technology.

Afternoon Tutorial 2: An Introduction to the Implementation of ZFS

Monday, 1:30 pm–5:00 pm

Grand Ballroom E

Dr. Marshall Kirk McKusick, Author and Consultant

Available Media

Much has been documented about how to use ZFS, but little has been written about how it is implemented. This tutorial pulls back the covers to describe the design and implementation of ZFS. The content of this tutorial was developed by scouring through blog posts, tracking down unpublished papers, hours of reading through the quarter-million lines of code that implement ZFS, and endless email with the ZFS developers themselves. The result is a concise description of an elegant and powerful system. It does not cover how to use and administrate ZFS.

ZFS was originally implemented in Sun Microsystems Solaris Operating System in the 1990's. It was released as open-source when Sun released Open Solaris and shortly after was incorporated into FreeBSD where it has become the most actively used filesystem. It was ported to Linux and continuously supported since by a group at Lawrence Livermore National Laboratory. Though it is not in the standard Linux distribution due to possible licensing conflicts, it has been included by Ubuntu since 2016 and can be added to most other Linux distributions.

Marshall Kirk McKusick, Author and Consultant

Dr. Marshall Kirk McKusick's work with Unix and BSD development spans over four decades. It begins with his first paper on the implementation of Berkeley Pascal in 1979, goes on to his pioneering work in the eighties on the BSD Fast File System, the BSD virtual memory system, the final release of 4.4BSD-Lite from the UC Berkeley Computer Systems Research Group, and carries on with his work on FreeBSD. A key figure in Unix and BSD development, his experiences chronicle not only the innovative technical achievements but also the interesting personalities and philosophical debates in Unix over the past forty years.

Continuing Education Units (CEUs)

USENIX provides Continuing Education Units for a small additional administrative fee. The CEU is a nationally recognized standard unit of measure for continuing education and training and is used by thousands of organizations.

Two half-day tutorials qualify for 0.6 CEUs. You can request CEU credit by completing the CEU section on the registration form. USENIX provides a certificate for each attendee taking a tutorial for CEU credit. CEUs are not the same as college credits. Consult your employer or school to determine their applicability.

FAST '20 Training Program

Monday, February 24, 2020

Half-Day Morning Session

Half-Day Afternoon Session

Continuing Education Units (CEUs)