Vault '20 Conference Program

Monday, February 24

8:00 am–9:00 am

Continental Breakfast

9:00 am–10:30 am

Zoned Storage

Zone Append: A New Way of Writing to Zoned Storage

Monday, 9:00 am9:30 am

Matias Bjørling, Western Digital, Inc.

The global hunger for more storage capacity requires us to rethink conventional storage interfaces that we use in today's storage infrastructures. The Zoned Namespaces (ZNS) SSD interface is a new standardized storage interface, which is being adopted by over 31% of the total cloud market. When compared to typical enterprise SSDs, ZNS SSDs get more than 20% additional capacity, require an order of magnitude less device DRAM while improving its tail latency (QoS). The Zoned Namespace interface defines sequential write required zones (similarly to ZAC/ZBC for SMR HDDs), which requires that host applications must write sequentially. This talk focuses on the newly introduced Zone Append I/O command, which enables a ZNS SSD to perform data placement within a given zone. The Zone Append command enables new ways to innovate in your storage system. Such as (1) writing directly from clients to ZNS SSDs when using a distributed file-system, (2) offload of fine-grained data placement to the SSD, and (3) eliminate single writer per zone bottleneck.

Matias Bjørling, Western Digital, Inc.

The explosive growth of data continues to grow year over year at a rate of 61% and is projected to reach 175 Zettabytes in 2025. The growth requires the industry to rethink how we store, access, and manages our data. Matias Bjørling, Ph.D., is hence on a path to shed conventional host interfaces, combine it with leading open-source ecosystems, and enable it to scale to our ever-increasing storage needs. Matias is one of the storage industry's leading storage architect and developer, widely recognized for his work on moving the industry to understand, use, and take advantage of Open-Channel SSD architectures. Matias is currently a Director of Emerging System Architectures at Western Digital, where he defines the next-generation storage interface to be adopted by more than 31% of the 38.9B total cloud market. It involves chairing the Zoned Namespaces (ZNS) SSDs technical proposal in the NVMe Working Group, engaging with internal and external adoptors, and leading a global team to enable the open-source ecosystem.

zonefs: Mapping POSIX File System Interface to Raw Zoned Block Device Accesses

Monday, 9:30 am10:00 am

Damien Le Moal, Western Digital Research; Ting Yao, Huazhong University of Science and Technology

zonefs is a new file system being proposed for inclusion in the Linux kernel. zonefs exposes zones of a zoned block device as files to allow simplifying application use of these type of storage devices. zonefs is not a full-featured POSIX compliant file system and is intended to replace and simplify device use cases where raw block device accesses are a better solution. This talk will present zonefs features, with a focus on how the rich POSIX system call interface is used and mapped to directly issue device specific control commands to zoned block devices. An example use of zonefs in LevelDB will be shown and the advantages in term of code simplicity over regular block device file access shown. Finally, we will discuss how zonefs can be extended to also support new types of zoned storage devices such as NVMe Zoned Namespace (ZNS) SSDs.

Damien Le Moal, Western Digital Research

Damien Le Moal manages the open-source system software group in Western Digital Research. He is a regular contributor to Linux kernel block, scsi and device-mapper subsystems and the author of zonefs. Damien regularly presents Western Digital work in the area of Linux ecosystem at various conferences, including several presentations at past Vault events.

File System Support for Zoned Block Devices

Monday, 10:00 am10:30 am

Naohiro Aota, Western Digital

Zoned block device (ZBD) support has been introduced in Linux with kernel version 4.10. ZBDs have different write constraints than regular block devices. A ZBD is divided into several zones and each zone must be written sequentially and must be reset before rewriting.

The main type of ZBD currently available is SMR HDDs. The NVMe Zoned NameSpace proposal is also being drafted to add a zone abstraction to the NVMe specifications.

Natively supporting ZBDs in a filesystem is not a trivial change. Some filesystems must rely on special block layer drivers to ensure sequential writes (e.g. ext4 and the dm-zoned device mappers). Filesystems using a copy-on-write design are better candidates for native ZBD support. Examples are F2FS and btrfs.

This talk discusses the principles of ZBD and ZNS native support in filesystems. Support in F2FS is discussed and the approach taken with btrfs is next presented. This is followed with a performance comparison between filesystems with native ZBD/ZNS support and regular ones using dm-zoned.

Naohiro Aota, Western Digital

Naohiro Aota is working at the System Software Group within Western Digital Research. He is working on zoned block device support for file systems like btrfs. He presented the on-going btrfs work at LSFMM 2019 and Open Source Summit Europe 2019.

10:30 am–11:00 am

Break with Coffee and Tea

11:00 am–12:30 pm

Emerging Interfaces

xadfs: Inverted DAX

Monday, 11:00 am11:30 am

Hannes Reinecke, SUSE Linux

The DAX functionality provides a mechanism to leverage DAX features from within existing filesystems. It is currently geared primarily for the 'mmap()' call, allowing applications to use the benefits for persistent memory while still having a normal filesystem. However, most filesystems are designed to handle efficient large block I/O, as with this traditional storage works best. Unfortunately, this is precisely what NVDIMM is notoriously bad at; NVDIMM really excels at small, even bit-wise, I/O patterns; in fact, some systems even have a performance penalty for large I/O blocks. Also, tests with standard I/O performance tools like 'fio' do not show any noticeable benefit from using DAX.

xadfs inverts this concept; use mmap() as the default I/O path, and map all other calls onto that. The prime goal here is to use NVDIMMs as efficiently as possible to demonstrate which performance benefits could be expected. With these results we will be gaining a better understanding for which use-cases NV-DIMMs should be employed, and which are better left to traditional storage.

In this talk I will be presenting a proof-of-concept implementation of xadfs, and providing a performance comparison with traditional filesystems implementing DAX functionality.

Hannes Reinecke, SUSE Linux

Studied Physics with main focus image processing in Heidelberg from 1990 until 1997, followed by a PhD in Edinburgh's Heriot-Watt University in 2000. Worked as sysadmin during the studies, mainly in the Mathematical Institute in Heidelberg. Now working at SUSE Labs as Teamlead for storage and networking. Principal contact point for storage related issues on SLES.

Linux addict since the earliest days (0.95); various patches to get Linux up and running. Main points of interest are storage, (i)SCSI, FC/FCoE, NVMe-over-Fabrics, and multipathing. And S/390, naturally.

I'm active on the Linux SCSI and NVMe mailing list, reviewing patches and dusting out murky corners in the SCSI stack. Plus occasionally maintaining the FCoE stack.

Linux User Library for NVM Express

Monday, 11:30 am12:00 pm

Keith Busch, WDC

The NVM Express workgroup is introducing new features frequently, and the Linux kernel supporting these devices evolves with it. This ever moving target creates challenges when developing tools when new interfaces are created, or older ones change. This talk will provide information on some of these recent enhancements, and introduce the new open source 'libnvme': a common library developed in a public repository that provides access to all NVM Express features with convenient abstractions to the kernel interfaces interacting with your devices. This session will also provide an opportunity for others to share what additional features they would like to see out of this common library in the future.

Keith Busch, WDC

Keith Busch develops, promotes, and maintains NVM Express and software that enables this protocol. He has provided many talks on the subject across the world since the introduction of this storage standard.

Programming Emerging Storage Interfaces

Monday, 12:00 pm12:30 pm

Simon Lund, Samsung

The popularity of NVMe has gone beyond the limits of the block device. Currently, NVMe is standardizing Key-Value (KV) and Zoned (ZNS) namespaces, and discussions on the standardization of computational storage namespaces have already started.

While modern I/O submission APIs are designed to support non-block submission (e.g., io_uring), these new interfaces incur an extra burden into applications, who now need to deal with memory constraints (e.g., barriers, DMA-able memory).

To address this problem, we have created xNVMe (pronounced cross-NVMe): a user-space library that provides a generic layer for memory allocations and I/O submission, and abstracts the underlying I/O engine (e.g., libaio, io_uring, SPDK).

In this talk, we (i) present the design and architecture of xNVMe, (ii) give examples of how applications can easily integrate with it and (iii) provide an evaluation of the overhead that it adds to the I/O path.

Simon Lund, Samsung

Simon Lund is a Staff Engineer at Samsung. His current work revolves around reducing the cognitive load for developers adopting emerging storage interfaces. Before Samsung, he worked at CNEX Labs designing and implementing liblightnvm: the Open-Channel SSD User Space Library. Simon received his Ph.D. on High Performance Backends for Array-Oriented Programming on Next-Generation Processing Units at the University of Copenhagen. He has given several talks on programming language, interpreter, and compiler design for HPC during his Ph.D. Most recently, in the industry at the SNIA Storage Developer Conference. Regardless of the topic, Simon's focus is the same, to bridge the gap between high-level abstractions and low-level control and measuring the cost and benefit of doing so.

12:30 pm–1:30 pm

Conference Luncheon

1:30 pm–3:00 pm

Scale Out

Scaling Databases and File APIs with Programmable Ceph Object Storage

Monday, 1:30 pm2:00 pm

Jeff LeFevre and Carlos Maltzahn, University of California, Santa Cruz

The Skyhook Data Management project (SkyhookDM.com) at the Center for Research in Open Source Software (cross.ucsc.edu) at UC Santa Cruz implements customized extensions through Ceph's object class interface that enables offloading database operations to the storage system. In our previous Vault '19 talk, we showed how SkyhookDM can transparently scale out databases. The SkyhookDM Ceph extensions are an example of our 'programmable storage' research efforts at UCSC, and can be accessed through commonly available external/foreign table database interfaces. Utilizing fast in-memory serialization libraries such as Google Flatbuffers and Apache Arrow, SkyhookDM currently implements common database functions such as SELECT, PROJECT, AGGREGATE, and indexing inside Ceph, along with lower-level data manipulations such as transforming data from row to column formats on RADOS servers.

In this talk, we will present three of our latest developments on the SkyhookDM project since Vault '19. First, SkyhookDM can be used to also offload operations of access libraries that support plugins for backends, such as HDF5 and its Virtual Object Layer. Second, in addition to row-oriented data format using Google's Flatbuffers, we have added support for column-oriented data formats using the Apache Arrow library within our Ceph extensions. Third, we added dynamic switching between row and column data formats within Ceph objects, a first step towards physical design management in storage systems, similar to physical design tuning in database systems.

Jeff LeFevre, University of California, Santa Cruz

Jeff LeFevre is an adjunct professor for Computer Science & Engineering at UC Santa Cruz. He currently leads the SkyhookDM project, and his research interests are in cloud databases, database physical design, and storage systems. Dr. LeFevre joined the CSE faculty in 2018, and has previously worked on the Vertica database for HP.

Carlos Maltzahn, University of California, Santa Cruz

Carlos Maltzahn is an adjunct professor for Computer Science & Engineering at UC Santa Cruz. He is the founder and director of Center for Research in Open Source Software (cross.ucsc.edu), and a co-founder of the Systems Research Lab, known for its cutting-edge work on programmable storage systems, big data storage & processing, scalable data management, distributed system performance management, and practical replicable evaluation of computer systems. In 2005 he co-founded and became a key mentor on Sage Weil’s Ceph project. Dr. Maltzahn joined the CSE faculty in 2008, has graduated nine Ph.D. students since, and has previously worked on storage for NetApp.

Scaling HDFS with Consistent Reads from Standby Replicas

Monday, 2:00 pm2:30 pm

Konstantin Shvachko, LinkedIn

We introduce a novel technique of serving read requests from Standby replicas of a metadata services in active-standby architecture. The technique is implemented in Hadoop Distributed File System (HDFS). It substantially improves performance of the metadata service and overall scalability of the entire system. We introduce a strong consistency model and show how HDFS addresses both read-your-own-writes and third-party-communication consistency challenges. The talk will outline HDFS architecture, its scalability and performance constraints, describe the architecture of consistent reads from standby, and provide performance results based on real-life exponentially growing Hadoop cluster at LinkedIn.

Konstantin Shvachko, LinkedIn

Konstantin V. Shvachko is an expert in Big Data technologies, file systems, and storage solutions. He specializes in efficient data structures and algo­rithms for large-scale distributed storage systems. Konstantin is known as an open-source software developer, author, inventor, and entrepreneur. He is a senior staff software engineer at LinkedIn.

Surviving a Disk Apocalypse with Single-Overlap Declustered Parity

Monday, 2:30 pm3:00 pm

Huan Ke, The University of Chicago

Massive storage systems composed of tens of thousands of disks are increasingly common in high-performance computing data centers. With such an enormous number of components integrated within the storage system the probability for correlated failures across a large number of components becomes a critical concern in preventing data loss. To better protect against correlated failures we introduce Single-Overlap Declustered Parity (SODP), a novel declustered parity design that tolerates large numbers of disk failures and minimizes rebuild time. Our evaluation results show during large failure bursts SODP improves protection against data loss by 20x compared to traditional declustered parity and provides almost identical rebuild times compared to the current state of the art.

Huan Ke, The University of Chicago

Huan Ke is a PhD student in the Department of Computer Science at the University of Chicago. Her supervisor is Professor Haryadi S. Gunawi. Currently, she is doing her internship in Los Alamos National Laboratory, where she primarily works with Bradley Settlemyer. Her research interests lie in the system performance, reliability and scalability. Particularly, she works on modelling system behaviors, detecting concurrency bugs, optimizing tail-latency problems, and checking scalability issues. Now her research agenda explores how to develop new data layouts to tolerate massive failures in the large-scale storage systems.

3:00 pm–3:30 pm

Break with Refreshments

3:30 pm–5:00 pm

Distributed File Systems

Crimson: A New Ceph OSD for the Age of Persistent Memory and Fast NVMe Storage

Monday, 3:30 pm4:00 pm

Samuel Just, Red Hat

The Crimson project is an effort to build a replacement ceph-osd daemon well suited to the new reality of low latency, high throughput persistent memory and NVMe technologies. Built on the seastar C++ framework, crimson-osd aims to be able to fully exploit these devices by minimizing latency, cpu overhead, and cross-core communication. This talk will discuss the design, current status, and future direction of the crimson project.

Samuel Just, Red Hat

Sam Just is an engineer at Red Hat focusing on Ceph. He began contributing to the Ceph project in 2010 and was the rados tech lead until 2017. Presently, his focus includes crimson and other projects within rados.

Sam has past experience speaking about Ceph, including at Vault.

Asynchronous Directory Operations in CephFS

Monday, 4:00 pm4:30 pm

Jeffrey Layton, Red Hat

Metadata-heavy workloads are often the bane of networked and clustered filesystems. Directory operations (create and unlink, in particular) usually involve making a synchronous request to a server on the network, which can be very slow.

CephFS however has a novel mechanism for delegating the ability for clients to do certain operations locally. While that mechanism has mostly been used to delegate capabilities on normal files in the past, it's possible to extend this to cover certain types of directory operations as well.

The talk will describe work that is being done to bring asynchronous directory operations to CephFS. It will cover the design and tradeoffs necessary to allow for asynchronous directory operations, discuss the server and client-side infrastructure being added to support it, and what performance gains we expect to gain from this.

Jeffrey Layton, Red Hat

Jeff Layton is a long time Linux kernel developer specializing in network file systems. He has made significant contributions to the kernel's NFS client and server, the CIFS client and the kernel's VFS layer. Recently, he has taken over as the maintainer of the Linux kernel's CephFS driver.

Introduction to Client-side Caching in Ceph

Monday, 4:30 pm5:00 pm

Mahati Chamarthy, Intel

Ceph is a unified distributed storage. Caching plays a significant role in enhancing performance in a distributed software. In this regard, caching in Ceph user-space has been evolving. This talk will explore the different caching policies that exist today for the Ceph block storage (RBD) along with going into the design details of upcoming feature work which is a write-back cache implementation on SSDs and persistent memory

Mahati Chamarthy, Intel

Mahati Chamarthy has been contributing to storage technologies for the past few years. She was a core developer for OpenStack Object Storage (Swift) and now an active contributor to Ceph. She works as a Cloud Software Engineer with Intel focusing on storage software development.

Previous speaking experience includes presenting at various OpenStack conferences, Ceph related conferences and also at the Linux Storage and Filesystems conference

5:00 pm–5:15 pm

Short Break

5:15 pm–6:15 pm

Distributed File Systems (continued)

Ephemeral Pinning: A Dynamic Metadata Management Strategy for CephFS

Monday, 5:15 pm5:45 pm

Sidharth Anupkrishnan, Red Hat

Having a separate cluster of Metadata Servers (MDS) is a well-known design strategy among distributed file-system architectures. One challenge faced by this approach is how to distribute metadata among the MDSs. Unlike data storage and its associated I/O throughput, which can be scaled linearly with the number of storage devices, file-system metadata is a fairly complex entity to scale due to its hierarchical nature. In hindsight, a pure hashing based metadata distribution strategy seems like a perfect fit. But, this is not exactly the case. What are the pitfalls then? Too many inter-MDS hops (due to POSIX traversal semantics), loss of hierarchical locality degrades file-system performance, and as a result, this is not beneficial for a workload whose directory hierarchy tree grows in depth rather than breadth. CephFS's metadata balancer takes a different approach by partitioning metadata sub-trees across MDSs thereby preserving good locality benefits. Although efficient, this involves a lot of back and forth migrations of sub-trees and the locality benefits are sometimes trumped by sub-optimal distributions.

In this talk, we present a new metadata distribution strategy employed in CephFS—Ephemeral Pinning. This strategy combines the benefits of hashing and naive sub-tree partitioning by intelligently pinning sub-trees to MDSs so as to obtain a balanced distribution as the workload metadata grows by depth and breadth. A consistent hashing based load balancer helps in maintaining an optimal distribution during addition or failure of MDSs.

Sidharth Anupkrishnan, Red Hat

Sidharth is part of the CephFS team at Red Hat. His focus is on the MDS component of CephFS and improving its scalability and standalone performance. He has presented on metadata distribution strategies at Devconf India 2019.

Reworking Observability In Ceph

Monday, 5:45 pm6:15 pm

Deepika Upadhya and Prajith Kesava Prasad, Red Hat

Jaeger and Opentracing provides ready to use tracing services for distributed systems and are becoming widely used de-facto standard because of their ease of use. Making use of these libraries, Ceph, can reach to a much-improved monitoring state, supporting visibility to its background distributed processes. This would, in turn, add up to the way Ceph is being debugged, “making Ceph more transparent” in identifying abnormalities. In this session, the audience will get to learn about using distributed tracing in large scale distributed systems like Ceph, an overview of Jaegertracing in Ceph and how someone can use it for debugging Ceph.

Deepika Upadhya, Red Hat

Deepika was an Outreachy intern for Summer’19, during which she worked on adding Jaeger and Opentracing(distributed tracing libraries) to Ceph, now she’s continuing her work being a full-time employee for Ceph with RADOS as her main area of focus.

Prajith Kesava Prasad, Red Hat

Prajith has published research and patent work revolving around using tech for improving lives. He holds experience working with Amazon and is working on developing a deployment solution for GlusterFS.

6:15 pm–8:00 pm

Dinner (on your own)

8:00 pm–10:00 pm

Birds-of-a-Feather Sessions (BoFs)

The evening Birds-of-a-Feather Sessions (BoFs) will be a forum for open discussion on topics of interest to the community. A few participants will present short introductions, overviews, or status reports from projects relevant to each of the topics, followed by informal discussion and participation. If you wish to present on a particular topic and have not already been contacted, please send us a short proposal at vault20chairs@usenix.org.

Tuesday, February 25

8:00 am–9:00 am

Continental Breakfast

9:00 am–10:00 am

Local Block Storage

Using Linux Block Integrity in Building and Testing Storage Systems

Tuesday, 9:00 am9:30 am

Mikhail Malygin, Yadro

Linux block integrity is a well-known block layer subsystem that helps to detect and prevent data corruption. This talk is based on hands on experience in building and testing storage systems and provides solutions for challenges faced in the block integrity stack. It also covers specifics of integrity implementation in SCSI or NVMe kernel drivers as well as in virtual environments: qemu, virtio, vhost.

Mikhail Malygin, Yadro

Principal Software Engineer at YADRO, with 10 years’ storage systems design and architecture. Technology expertise in storage, distributed systems, scale-out solutions, operation systems, networks, performance and reliability. Open source contributor. Conference speaker. Leading engineer for YADRO Tatlin (unified storage). Key contributor to Dell EMC ECS (scale-out geo distributed object storage), Dell EMC Centera (purpose-built archive solution), Centera Virtual Archive. Architected Yadro Tatlin storage data protection layer. Yadro Tatlin achieved 500+ petabytes deployment in 2019. Author of 10+ patents and patent application in storage area.

The Two New I/O Controllers and BFQ

Tuesday, 9:30 am10:00 am

Paolo Valente, Department of Physics, Computer Science and Mathematics - University of Modena and Reggio Emilia - Italy

Two new I/O controllers landed in Linux: io.latency and io.cost. They are aimed at controlling latency and bandwidth, respectively. Yet both quantities are guaranteed by the BFQ I/O scheduler as well. So, when should we use BFQ and when these controllers? Unfortunately, there is no comprehensive documentation answering this question.

To address this issue, in this presentation we compare these new controllers with BFQ, in terms of both interface and performance. Unfortunately, the most important result shown in this presentation is that both controllers apparently exhibit a bad performance, on all systems used in our tests. The root problem seems that they fail to control I/O with common workloads.

Paolo Valente, Department of Physics, Computer Science and Mathematics - University of Modena and Reggio Emilia - Italy

Paolo Valente is an Assistant Professor of Computer Science at the University of Modena and Reggio Emilia, Italy, and a collaborator of the Linaro engineering organization. Paolo's main activities focus on scheduling algorithms for storage devices, transmission links and CPUs. In this respect, Paolo is the author of the last version of the BFQ I/O scheduler. BFQ entered the Linux kernel from 4.12, providing unprecedented low-latency and fairness guarantees. As for transmission links, Paolo is one of the authors of the QFQ packet scheduler, which has been in the Linux kernel until 3.7, after that it has been replaced by QFQ+, a faster variant defined and implemented by Paolo himself. Finally, Paolo has also defined and implemented other algorithms, part of which are now in FreeBSD, and has provided new theoretic results on multiprocessor scheduling.

10:00 am–10:30 am

Break with Coffee and Tea

10:30 am–12:00pm

Local Block Storage (continued)

Speeding Up Linux Disk Encryption

Tuesday, 10:30 am11:00 am

Ignat Korchagin

Encrypting data at rest is a must-have for any modern SaaS company. And if you run your software stack on Linux, LUKS/dm-crypt is the usual go-to solution. However, as the storage becomes faster, the IO latency, introduced by dm-crypt becomes rather noticeable, especially on IO intensive workloads.

At first glance it may seem natural, because data encryption is considered an expensive operation. But most modern hardware (specifically x86 and arm64) platforms have hardware optimisations to make encryption fast and less CPU intensive. Nevertheless, even on such hardware transparent disk encryption performs quite poorly.

When looking into dm-crypt source code, we noticed that it has a lot of indirection and offloading: instead of encrypting/decrypting IO requests synchronously dm-crypt offloads every operation to a dedicated thread. By making a simple PoC patch and removing all the offloading code we were able to speed up the overall read speed from an encrypted block device by 200%-300% depending on the block size.

This talk aims to revisit the architecture and design choices of the dm-crypt module and research ideas on how to make Linux transparent disk encryption faster.

Ignat Korchagin[node:field-speakers-institution]

Ignat is a systems engineer at Cloudflare working mostly on platform and hardware security. Ignat’s interests are cryptography, hacking, and low-level programming. Before Cloudflare, Ignat worked as a senior security engineer for Samsung Electronics’ Mobile Communications Division. His solutions may be found in many older Samsung smart phones and tablets. Ignat started his career as a security researcher in the Ukrainian government’s communications services.

Key Per IO Security Subsystem Class for NVM Express Storage Devices

Tuesday, 11:00 am11:30 am

Sridhar Balasubramanian and Frederick Knight, NetApp, Inc.

The Key Per IO (KPIO) proposal is a joint initiative between NVMe and TCG to define a new KPIO Security Subsystem Class (SSC) under TCG Opal SSC.

Self-Encrypting Drives (SED) perform continuous encryption on user accessible data based on contiguous LBA ranges per namespace. This is done at interface speeds using a small number of keys generated/held in NVM by the storage device.

KPIO will allow large numbers of encryption keys to be managed and securely downloaded into the NVM subsystem. Encryption of user data then occurs on a per command basis (each command may use a different key). This provides a finer granularity of data encryption that enables a granular encryption scheme in order to support the following use cases:

  1. Easier support of European Union’s General Data Protection Regulations’ (GDPR) “Right to be forgotten”
  2. Easier support of data erasure when data is spread over many disks (e.g., RAID/Erasure Coded)
  3. Easier support of data erasure of data that is mixed with other data needing to be preserved
  4. Assigning an encryption key to a single sensitive file or a host object

The talk will include a brief introduction to architectural differences between traditional SED and the KPIO SSC, followed by an overview of the proposed KPIO SSC standard, and subtle features of the KPIO SSC.

The talk will conclude by summarizing current state of the standardization proposal with NVMe and TCG WG's.

Sridhar Balasubramanian, NetApp, Inc.

Sridhar is currently working as Principal Security Architect within Product Security Group @ NetApp. With over 25 years in software industry, Sridhar is inventor/co-inventor for 16 US Patents and published 7 Conference papers till date. Sridhar's area of expertise includes Storage and Information Security, Security Assurance, Secure Software Development Lifecycle, Secure Protocols, and Storage Management. Sridhar holds a Master's degrees in Physics and Electrical Engineering

Frederick Knight, NetApp, Inc.

Frederick Knight is a Principal Standards Technologist at NetApp Inc. Fred has over 40 years of experience in the computer and storage industry. He currently represents NetApp in several National and International Storage Standards bodies and industry associations, including T10 (SCSI), T11 (Fibre Channel), T13 (ATA), IETF (iSCSI), SNIA, and JEDEC. He was the chair of the SNIA Hypervisor Storage Interfaces working group, the primary author of the SNIA HSI White Paper, the author of the new IETF iSCSI update RFC, and the editor for the T10 SES-3 standard. He is also the editor for the SCSI Architecture Model (SAM-6) and the Convenor for the ISO/IEC JTC-1/SC25/WG4 international committee (which oversees the international standardization of T10/T11/T13 documents). Fred has received several NetApp awards for excellence and innovation as well as the INCITS Technical Excellence Award for his contributions to both T10 and T11 and the INCITS Merit Award for his longstanding contributions to the international work of INCITS.

He is also the developer of the first native FCoE target device in the industry. At NetApp, he contributes to technology and product strategy and serves as a consulting engineer to product groups across the company. Prior to joining NetApp, Fred was a Consulting Engineer with Digital Equipment Corporation, Compaq, and HP where he worked on clustered operating system and I/O subsystem design.

Local File System

Accelerating Filesystem Checking and Repair with pFSCK

Tuesday, 11:30 am12:00 pm

David Domingo and Sudarsun Kannan, Rutgers University; Kyle Stratton

File system checking and recovery (C/R) tools play a pivotal role in increasing the reliability of storage software identifying and correcting file system inconsistencies. However, with increasing disk capacity and data content, file system C/R tools notoriously suffer from a long-running time. We posit that the current file system checkers fail to exploit CPU parallelism and high throughput offered by modern storage devices.

To overcome these challenges, we propose pFSCK, a tool that redesigns C/R to enable fine-grained parallelism at the granularity of inodes without impacting the correctness of C/R’s functionality. To accelerate C/R, pFSCK first employs data parallelism by identifying functional operations in each stage of the checker and isolating dependent operation and their shared data structures. However, fully isolating shared structures is infeasible, consequently requiring serialization that limits scalability. To reduce the impact of synchronization bottlenecks and exploit CPU parallelism, pFSCK designs pipeline parallelism allowing multiple stages of C/R to run simultaneously without impacting correctness. To realize efficient pipeline parallelism for different file system data configurations, pFSCK provides techniques for ordering updates to global data structures, efficient per-thread I/O cache management, and dynamic thread placement across different passes of a C/R. Finally, pFSCK designs a resource-aware scheduler aimed towards reducing the impact on other applications shar- ing CPUs and the file system. Evaluation of pFSCK shows more than 3.72x gains of FSCK and 1.71x over XFS checker that provides coarse-grained parallelism.

12:00 pm–1:30 pm

Conference Luncheon

1:30 pm–3:30 pm

Network File Systems

What's New in Samba?

Tuesday, 1:30 pm2:00 pm

Jeremy Allison, Samba Team, Google

Development of Samba, the Open Source File/Print/Authentication/Active Directory server for Linux and FreeBSD is accelerating. Come and hear about the new developments in Samba, the most widely used File Server in the Open Source world, and increasingly used as a gateway to Cloud File Storage.

Jeremy Allison, Google

Jeremy Allison is a frequent speaker at Storage, Linux and Samba events and is one of the original members of the Samba team. He works for Google.

Opening up Linux to the Wider World: Status and Recent Progress in the Linux/POSIX Extensions to the SMB3.1.1 Protocol

Tuesday, 2:00 pm2:30 pm

Steven French, Microsoft Azure Storage; Jeremy Allison, Samba Team, Google

The SMB3 .1.1 POSIX Extensions, a set of protocol extensions to allow for optimal Linux and Unix interoperability with NAS and Cloud file servers, have made good progress over the past year in the Linux kernel client and the Samba server and even smbclient Samba tools (and now some third party servers as well). We will discuss

  • what is the current status
  • how do new Linux file system features map to these extensions?
  • what have we learned (and a a result what has changed in the protocol specification)
  • what are suggestions for implementors of SMB3.1.1 servers?
  • what is useful information for users to know in order to try these extensions?
  • are future extensions planned?

These extensions greatly improve the experience for users of Linux, and will help make SMB3.1.1 even more broadly applicable for accessing files remotely to and from Linux (the SMB3 protocol family is already incredibly widely deployed across many operating systems, Samba and the cloud). This presentation will review the state of the protocol extensions and their current implementation in the Linux kernel and Samba among others, and provide an opportunity for feedback and suggestions for additions to the POSIX extensions.

This has been an exciting year with many improvements to the implementations of the SMB3.1.1 POSIX Extensions in Samba and Linux!

SMB3.1.1 is evolving to be t

Steven French, Microsoft Azure Storage

Steve French is an expert on SMB3 and File Systems. Original author as well as maintainer of the Linux CIFS/SMB3 client and member of the Samba team. Works for Microsoft as a Principal Engineer in Azure Storage. Was File Systems Architect for IBM Linux Technology Center previously, and Chair of SNIA CIFS Working Group. One of more active developers in Linux kernel file systems.

Steve has spoken at the annual Storage Developer Conference and also SambaXP for the past five years or more (and also have spoken at Vault, Linux Plumbers, the Linux FS Summit etc. in recent years).

Jeremy Allison, Google

Jeremy Allison is a frequent speaker at Storage, Linux and Samba events and is one of the original members of the Samba team. He works for Google.

Implementing SMB Semantics in a Linux Cluster

Tuesday, 2:30 pm3:00 pm

Volker Lendecke, Samba Team

To implement the SMB protocol, Samba has to implement semantics that are not covered by the Linux kernel API. The protocol element to mention here are the concept of share modes and leases, similar to NFSv4 share reservations and delegations. To implement those, Samba has to maintain data structures in user space and keep those consistent across cluster nodes. One of those data structures is a central table containing SMB-level information about all file open instances.

This talk will describe the semantics to be implemented, the challenges for clustered implementations of the SMB protocol and approaches by the Samba Team to make this scale well across nodes.

Using kAFS on Linux for Network Home Directories

Tuesday, 3:00 pm3:30 pm

Jonathan Billings, University of Michigan, College of Engineering, CAEN

The AFS filesystem has been widely in use at educational and research institutions since the mid-80s, and continues to be a service that many universities, including the University of Michigan, provides to students, staff and faculty. The Linux kernel has recently improved support for the AFS filesystem, and now some Linux distributions provide support for AFS out of the box. I will discuss the history of AFS, the in-kernel AFS client, and its performance compared to the out-of-kernel OpenAFS client. I will demonstrate some of the benefits and limitations when using AFS as a home directory in a modern Linux distribution such as Fedora, including working with systemd and GNOME.

Jonathan Billings, University of Michigan, College of Engineering, CAEN

Jonathan Billings has been a senior systems programmer at the University of Michigan for the past ten years, supporting a Linux computing environment for students, researchers and faculty in the College of Engineering. He has worked as a systems administrator at Carnegie Mellon, Rutgers and Princeton. He has given talks at the Ohio LinuxFest.

3:30 pm–4:00 pm

Break with Refreshments

4:00 pm–5:30 pm

Kubernetes & Storage

Understanding Kubernetes Storage: Getting in Deep by Writing a CSI Driver

Tuesday, 4:00 pm4:30 pm

Gerry Seidman, AuriStor

Understanding the many Kubernetes storage ‘objects’ along with their not-always-obvious interaction and life-cycles can be daunting (Volumes, Persistent Volumes, Persistent Volume Claims, Volume Attachments, Storage Classes, Volume Snapshots, CSIDriver, CSINode, oh my...)

Perhaps the best ways to gleen a deep understand of these storage objects and how storage-related scheduling works in Kubernetes is to write a Container Storage Initiative (CSI) driver. While most of us will never need to write a CSI driver, in this session we will make storage with Kubernetes more accessible by exploring it from an inside-out approach learned by writing a CSI Driver.

In this session you will obtain an understanding of:

  • How Kubernetes 'Volumes' relate to mounted storage available from within containers
  • The Kubernetes Declarative Model
  • The many Kubernetes Storage Objects
  • Kubernetes Scheduling and how it is influenced by storage object
  • How Kubernetes controllers move the storage objects through their life-cycles
  • What is the role and responsibility of a storage type specific CSI Driver
  • What are the roles and responsibilities of CSI support ‘Side-Cars’
  • Putting it all together and relating this all back to how to write a CSI Driver

Gerry Seidman, AuriStor

Gerry Seidman has a long career having designed and implemented many complex, secure, high-performance, high-availability and fault tolerant distributed systems. He is President at AuriStor where he is still very hands-on including the design and implementation of the AuriStor/AFS Kubernetes/CSI Driver.

Provisioning Object Storage In Kubernetes

Tuesday, 4:30 pm5:00 pm

Afreen Rahman, Red Hat

Have you ever wondered how to provision object storage in your existing Kubernetes ecosystem. How these already available options (e.g ceph) provide bucket provisioning ? This talk will cover that under the following bullets for you:

  • What is bucket provisioning?
  • Why its not supported in Kubernetes?
  • What is the concept of OB/OBC to provision object storage?
  • What is working behind the scenes: Creation of CRDs, Role Bindings, storage class, configmaps, secrets and the underlying architecture.
  • Demo to explain the flow and creation of buckets with OB/OBC.

Afreen Rahman, Red Hat

Writes code nowadays @Red Hat. Currently into Noobaa (MCG) , Openshift, go, React and Typescript. Maintains the plugin packages for noobaa and ceph in openshift console. Previously worked @Hasura as an intern on the Hasura K8s ecosytem, past GHCI 2018 scholar, provides mentoring to her college juniors via newsletters and monthly sessions and loves playing arcade games.

Lustre in Kubernetes

Tuesday, 5:00 pm5:30 pm

Dan Lambright, Facebook; Nakul Vankadari Ramesh, Akriti Bhat, Xing Du, and Anand Kumar, Northeastern University

Lustre is a distributed file system popular with high performance computing (HPC) workloads. In this talk, we will describe how we run Lustre within Kubernetes. This makes Lustre portable across different cloud platforms and helps automate deployment, elasticity and management. We’ll show how we used Kubevirt to get Lustre’s kernel modules into containers and show our performance measurements using that infrastructure. Our objective has been to integrate Lustre as closely as possible within Kubernetes, and we will demonstrate how we provision Lustre infrastructure using standard YAML configuration files.

The second part of our talk will discuss our work-in-progress to leverage the Rook framework. We wish to use it to help implement features such as autoscaling and node failure recovery. We are developing a Kubernetes operator that interacts with the KubeVirt operator for VM provisioning and other tasks. We hope to collaborate with the Rook community on this.

We are a team of four graduate students from Northeastern University, mentored by an industry specialist. Our work was done on the Massachusetts Open Cloud (MOC).

Dan Lambright, Facebook

Dan Lambright has given talks at Vault in the past. He is currently at Facebook working on social graph consistency, and enjoying working with the team at Northeastern.

Nakul Vankadari Ramesh, Northeastern University

Nakul Vankadari Ramesh previously worked at Intel, Schneider Electric and Philips. He is keen on learning cloud technologies for large scale systems and enjoys exploring various efforts in open-source.

Akriti Bhat, Northeastern University

Akriti Bhat previously worked with J.P. Morgan and Amazon Web Services. She is passionate about developing software to support large scale applications.

Anand Kumar, Northeastern University

Anand Kumar is pursuing concurrent Bachelors and Masters degrees in Computer Science. Throughout academics and his internships at Google and Circle, he has enjoyed creating software to build comprehensive user experiences.

5:30 pm–7:00 pm

Dinner (on your own)

7:00 pm–10:00 pm

Birds-of-a-Feather Sessions (BoFs)

The evening Birds-of-a-Feather Sessions (BoFs) will be a forum for open discussion on topics of interest to the community. A few participants will present short introductions, overviews, or status reports from projects relevant to each of the topics, followed by informal discussion and participation. If you wish to present on a particular topic and have not already been contacted, please send us a short proposal at vault20chairs@usenix.org.