All sessions will take place in OCC Room 208 at the Oakland Marriott City Center and the Oakland Convention Center.
Thursday, February 15, 2018
12:00 pm–1:00 pm
Lunch and Introductions
1:00 pm–1:30 pm
Adam Manzanares, WDC
With the introduction of byte addressable storage devices that have low latencies, it becomes difficult to decide how to expose these devices to user space applications. Do we treat them as traditional block devices or expose them as a DAX capable device? A traditional block device allows us to use the page cache to take advantage of locality in access patterns, but comes at the expense of extra memory copies that are extremely costly for random workloads. A DAX capable device seems great for the aforementioned random access workload, but suffers once there is some locality in the access pattern.
When DAX-capable devices are used as slower/cheaper volatile memory, treating them as a slower NUMA node with an associated NUMA migration policy would allow for taking advantage of access pattern locality. However this approach suffers from a few drawbacks. First, when those devices are also persistent, the tiering approach used in NUMA migration may not guarantee persistence. Secondly, for devices with significantly higher latencies than DRAM, the cost of moving clean pages may be significant. Finally, pages handled via NUMA migration are a common resource subject to thrashing in case of memory pressure.
I would like to discuss an alternative approach where memory intensive applications mmap these storage devices into their address space. The application can specify how much DRAM could be used as a cache and have some influence on prefetching and eviction policies. The goal of such an approach would be to minimize the impact of the slightly slower memory could potentially have on a system when it is treated as kernel managed global resource, as well as enable use of those devices as persistent memory. BTW we criminally ;) used the vm_insert_page function in a prototype and have found that it is faster to use vs page cache and swapping mechanisms limited to use a small amount of DRAM.
I believe it would be beneficial to talk with kernel developers to get their opinion on such an approach. Academics can be exposed to the challenges that industry researchers are currently grappling with.
1:40 pm–2:10 pm
Theodore Ts'o, Google
I'd like to talk about a proposal to implement and upstream something that we've been calling fs-verity, which is something like dm-verity, but implemnted on a per-file basis. It will be implemnted much like fs/crypto, in that most of the code will be in a generic layer, with minimal modifications needed in the file system layer.
The merkle tree will be located after file's normal data, and then after the package manager sets the verity bit, i_size will be updated so that the fs-verity header and merkle tree will be "hidden" from userspace and the file will become immutable.
How does this differ from IMA's file integrity?
*) The pages are verified as they are read, so pages are verified as they are read on the storage device; this avoids a large latency hit when the file is first opened or referenced.
*) The design and code are done by file system developers, so it doesn't have the locking problems of the IMA code.
The initial use case of this will be for Android, where the latency concerns of doing the full checksum at file open time is important.
In the future, the fact that a file has been signed using fs-verity, using a PKCS 11 signature with a key on a trusted keyring (possibly the same one used for signed kernel modules, or perhaps a separate keyring) could be used as input into a security policy which requires this for say, setuid executables, setuid shell scripts, etc.
Most of this feature could also be used with a non-cryptographic checksum to provide data checksums for read-only files in a general way for all file systems. It wouldn't be as flexible as btrfs, but for files being stored for backup purposes, it should work quite well.
3:00 pm–3:45 pm
Dan Williams, Intel
With v4.15 of the Linux kernel the Filesystem-DAX implementation has reached a level of functionality that satisfies a wide range of persistent memory applications, but there is more work to do. This "What's Next for Filesystem-DAX" session will discuss gaps and next steps in this space. Potential topics include:
- RDMA interfaces for long term memory registration
- DAX implications for filesystem metadata management
- Error handling, i.e. going beyond "bad blocks" lists at the block device layer, and stray write protection
- Performance topics, for example what to do about cases where persistent memory could benefit from DRAM page cache.
- Gigantic page support, can current Linux filesystems support 1GB page mappings.
3:55 pm–4:40 pm
Round table—ask the Linux developers anything; refreshments provided.
4:45 pm–5:30 pm
Om Rameshwar Gatla, New Mexico State University
File systems are built to efficiently store data, but in scenarios such as system failures, file systems may fail to preserve the integrity of the data. In such scenarios, most file systems employ a checker program to recover from a corrupted state.File system checker usually scans the entire file system and checks a set of consistency rules to modify the metadata and bring back the file system to a consistent state. Depending on the corruption scenario, checkers may take a long time to fix inconsistencies. We can also observe that checkers directly update the on-disk metadata structures. Many modern file systems employ various failure techniques to mitigate the effects of system crashes while modifying it’s metadata, but we do not observe such protection available while running the checker. Therefore any interruption to the repair procedure may further corrupt the file system.
In our work we study the behavior of checkers with interruption. We have observed that any interruption to the repairing procedure causes irreparable damage to the file system. To mitigate this issue, we have designed and implemented rfsck-lib, a general write ahead logging library with a simple interface. Based on the similarities in the implementation of existing checkers, rfsck-lib decouples the logging from the repairing procedure and provides an interface to log repairing writes in fine granularity.