USENIX - FAST '02

Works-in-Progress Reports (WiPs)
Session Chair: Scott Brandt, University of California, Santa Cruz

Monday, January 28, 2002

8:00 pm—10:00 pm

The WIP Program:

NFS over RDMA, Brent Callaghan, Sun Microsystems
The Case for Massive Arrays of Idle Disks (MAID), Dennis Colarelli, Dirk Grunwald, and Michael Neufeld, University of Colorado, Boulder
Cooperative Backup System, Sameh Elnikety, Rice University; Mark Lillibridge, COMPAQ SRC; Mike Burrows, Microsoft Research; and Willy Zwaenepoel, Rice University
Federated File Systems for Clusters with Remote Memory Communication, Suresh Gopalakrishnan, Ashok Arumugam, and Liviu Iftode, Rutgers University
An Iterative Technique for Distilling a Workload's Important Performance Information, Zachary Kurmas, Georgia Tech; and Kimberly Keeton, HP Labs
Larger Disk Blocks or Not? Steve McCarthy, Mike Leis, and Steve Byan, Maxtor Corporation
Lazy Parity Update : A Technique to Improve Write I/O Performance of Disk Array Tolerating Double Disk Failures, Young Jin Nam, Dae-Woong Kim, Tae-Young Choe, and Chanik Park, Pohang University of Science and Enigineering, Kyungbuk, Republic of Korea
The Armada Framework for Parallel I/O on Computational Grids, Ron Oldfield and David Kotz, Dartmouth College
IBM Storage Tank™: A Distributed Storage System, D. A. Pease, R. M. Rees, W. C. Hineman, D. L. Plantenberg, R. A. Becker-Szendy, R. Ananthanarayanan, M. Sivan-Zimet, C. J. Sullivan, IBM Almaden Research Center; R. C. Burns, Johns Hopkins University; D. D. E. Long, University of California, Santa Cruz
Data Placement Based on the Seek Time Analysis of a MEMS-based Storage Device
Zachary N. J. Peterson, Scott A. Brandt, Darrell D. E. Long, University of California, Santa Cruz
Logistical Networking Research and the Network Storage Stack, James S. Plank, Micah Beck, and Terry Moore, University of Tennessee
Enhancing NFS Cross-Administrative Domain Access, Joseph Spadavecchia and Erez Zadok, Stony Brook University
StorageAgent: An Agent-based Approach for Dynamic Resource Sharing in a Storage Service Provider (SSP) Infrastructure, Sandeep Uttamchandani, IBM Almaden Research Center
Conquest: Better Performance Through a Disk/Persistent-RAM Hybrid File System, An-I A. Wang, Peter Reiher, Gerald J. Popek, University of California, Los Angeles; Geoffrey H. Kuenning, Harvey Mudd College

NFS over RDMA
Brent Callaghan, Sun Microsystems
Network bandwidth is growing by orders of magnitude. Yet conventional processing of NFS traffic over gigabit networks gobbles CPU. Using RDMA protocols, we expect NFS to make full and efficient use of gigabit networks.
The Case for Massive Arrays of Idle Disks (MAID)
Dennis Colarelli, Dirk Grunwald, and Michael Neufeld, University of Colorado, Boulder
The declining costs of commodity disk drives is rapidly changing the economics of deploying large amounts of on-line storage. Conventional mass storage systems typically use high performance RAID clusters as a disk cache, often with a file system interface. The disk cache is backed by tape libraries which serve as the final repository for data. In mass storage systems where performance is an issue tape may serve only as a deep archive for disaster recovery purposes. In this case all data is stored on the disk farm. If a high availability system is required, the data is often duplicated on a separate system, with a fail-over mechanism controlling access.
This work explores an alternative design using massive arrays of idle disks, or MAID. We argue that this storage organization provides storage densities matching or exceeding those of tape libraries with performance similar to disk arrays. Moreover, we show that through a combination of effective power management of individual drives and the use of cache or migration, this performance can be achieved using a very small power envelope.
We examine the issues critical to the performance, energy consumption and practicality of sev-eral classes of MAID systems. The potential of MAID to save energy costs with a relatively small performance penalty is demonstrated in a comparison with a conventional RAID 0 storage array.

Cooperative Backup System
Sameh Elnikety, Rice University; Mark Lillibridge, Compaq SRC; Mike Burrows, Microsoft Research; and Willy Zwaenepoel, Rice University
This paper presents the design of a novel backup system built on top of a peer-to-peer architecture with minimal supporting infrastructure. The system can be deployed for both large-scale and small-scale peer-to-peer overlay networks. It allows computers connected to the Internet to back up their data cooperatively. Each computer has a set of partner computers and stores its backup data distributively among those partners. In return, such a way as to achieve both fault-tolerance and high reliability. This form of cooperation poses several interesting technical challenges because these computers have independent failure modes, do not trust each other, and are subject to third party attacks.
Federated File Systems for Clusters with Remote Memory Communication
Suresh Gopalakrishnan, Ashok Arumugam, and Liviu Iftode, Rutgers University
We present the design, prototype implementation and initial evaluation of FedFS - a novel cluster file system architecture that provides a global file space by aggregating the local file systems of the cluster nodes into a loose federation. The federated file system (FedFS) is created ad-hoc for a distributed application that runs on the cluster, and its lifetime is limited by the lifetime of the distributed application. FedFS provides location-independent global file naming, load balancing, and file migration and replication. It relies on the local file systems to perform the file I/O operations.
The local file systems retain their autonomy, in the sense that their structure and content do not change to support the federated file system. Other applications may run on the local file systems without realizing that the same file system is part of one or multiple FedFS. If the distributed application permits, nodes can dynamically join or leave the federation anytime, with no modifications required to the local file system organization.
FedFS is implemented as an I/O library over VIA, which supports remote memory operations. The applicability and performance of the federated file system architecture is evaluated by building a distributed NFS file server.
An Iterative Technique for Distilling a Workload's Important Performance Information
Zachary Kurmas, Georgia Tech; Kimberly Keeton, HP Labs
Larger Disk Blocks or Not?
Steve McCarthy, Mike Leis, and Steve Byan, Maxtor Corporation
The recent annual compound growth rate of disk drive areal density has been 100% - a doubling of capacity every year. This growth rate is faster than MooreÕs Law - advances in disk technology have been outpacing advances in semiconductor technology. Part of the reason for this spectacular growth rate is that areal density is a two-dimensional problem. Succeeding product generations increase both the number of tracks per inch (TPI) radially and the number of linear bits per inch (BPI) circumferentially. However, both parameters are facing technical challenges that may slow the rate of capacity growth. In this paper, we will briefly examine some of the obstacles to increased BPI and propose an increase in sector size as an aid to surmounting them.
Lazy Parity Update: A Technique to Improve Write I/O Performance of Disk Array Tolerating Double Disk Failures
Young Jin Nam, Dae-Woong Kim, Tae-Young Choe, and Chanik Park, Pohang University of Science and Engineering, Kyungbuk, Republic of Korea
The Armada Framework for Parallel I/O on Computational Grids
Ron Oldfield and David Kotz, Dartmouth College
IBM Storage Tank™: A Distributed Storage System
D. A. Pease, R. M. Rees, W. C. Hineman, D. L. Plantenberg, R. A. Becker-Szendy, R. Ananthanarayanan, M. Sivan-Zimet, C. J. Sullivan, IBM Almaden Research Center; R. C. Burns, Johns Hopkins University; D. D. E. Long, University of California, Santa Cruz
IBM Storage Tank™ is a SAN-based distributed object storage system for use in heterogeneous environments. It provides performance comparable to that of file systems built on bus-attached, high-performance storage, as well as advanced storage and data management functions. It is designed to be highly available and scalable. The Storage Tank project has been underway at IBM's Almaden Research Center for several years.
Storage Tank is designed to work with any Storage Area Network architecture, as well as with any SAN storage hardware. (It currently runs on both Fibre Channel and iSCSI SANs.) It is also designed to be portable to essentially any host system architecture.
This paper provides a high-level overview of Storage Tank's design and features.
Data Placement Based on the Seek Time Analysis of a MEMS-based Storage Device
Zachary N. J. Peterson, Scott A. Brandt, Darrell D. E. Long, University of California, Santa Cruz
Reducing access times to secondary I/O devices has long been the focus of many systems researchers. With traditional disk drives, access time is the composition of transfer time, seek time and rotational latency, so many techniques as to minimize these factors, such as ordering I/O requests or intelligently placing data, have been developed. MEMS-based storage devices are seen by many as a replacement or an augmentation for modern disk drives, but algorithms for reducing access time for MEMS-based storage are still poorly understood. These devices, based on MicroElectroMechanical systems (MEMS), use thousands of active read/write heads working in parallel on a non-rotating magnetic substrate, eliminating rotational latency from the access time equation. This leaves seek time as the dominant variable. Therefore, new data layout techniques based on minimizing the unique seek time characteristics of a MEMS-based storage device can be developed. This paper begins to examine the access qualities of a MEMS-based storage device, and based on experimental simulation, develops an understanding of the seek time characteristics of such a device. These characteristics then allow us to identify equivalent regions in which to place data for improved access.
Logistical Networking Research and the Network Storage Stack
James S. Plank, Micah Beck, and Terry Moore, University of Tennessee
Enhancing NFS Cross-Administrative Domain Access
Joseph Spadavecchia and Erez Zadok, Stony Brook University
The access model of exporting NFS volumes to clients suffers from two problems. First, the server depends on the client to specify the user credentials to use and has no flexible mechanism to map or restrict the credentials given by the client. Second, there is no mechanism to hide data from users who do not have privileges to access it. Although NFSv4 promises to fix the first problem us-ing universal identifiers, it does not provide a mechanism for hiding data and is not expected to be in wide use for a long time.
We address these problems by a combination of two solutions. First, range-mapping is a mechanism that allows the NFS server to restrict and flexibly map the credentials set by the client. Second, file-cloaking allows the server to control the data a client is able to view or access beyond normal Unix semantics. Our design is compatible with all versions of NFS, including NFSv4. We have implemented this work in Linux and made changes only to the NFS server code; client-side NFS and the NFS protocol remain unchanged. Our evaluation shows a minimal average performance overhead and, in some cases, an end-to-end performance improvement.
StorageAgent: An Agent-based Approach for Dynamic Resource Sharing in a Storage Service Provider (SSP) Infrastructure
Sandeep Uttamchandani, IBM Almaden Research Center
In a SSP Infrastructure, the resources of the Storage Server namely cache, memory and CPU are shared in an ad-hoc manner among the clients. These resources play an important role in determining the overall Throughput and Latency of data-access. In this paper, we propose StorageAgent: A systematic, secure and efficient approach for distributing resources. Built on agent-based semantics for dynamic resource sharing, StorageAgent achieves the following goals. First, there is an efficient utilization of available resources as there are well-defined semantics for lending and reclaiming resources. Second, security of data is ensured as access to borrowed resources is controlled solely by trusted-agents. Third, fine-grain control and metering of resources used by individual clients is possible.
Conquest: Better Performance Through a Disk/Persistent-RAM Hybrid File System
An-I A. Wang, Peter Reiher, and Gerald J. Popek, University of California, Los Angeles; Geoffrey H. Kuenning, Harvey Mudd College
Conquest is a disk/persistent-RAM hybrid file system that is incrementally deployable and realizes most of the benefits of cheaply abundant persistent RAM. Conquest consists of two specialized and simplified data paths for in-core and on-disk storage and outperforms popular disk-based file systems by 43% to 97%.