NFS over RDMA|
Brent Callaghan, Sun Microsystems
Network bandwidth is growing by orders of
magnitude. Yet conventional processing of NFS
traffic over gigabit networks gobbles CPU. Using
RDMA protocols, we expect NFS to make full and
efficient use of gigabit networks.
The Case for
Massive Arrays of Idle Disks (MAID)
Dennis Colarelli, Dirk Grunwald, and Michael Neufeld,
University of Colorado, Boulder
The declining costs of commodity disk drives is rapidly changing the economics of deploying large
amounts of on-line storage. Conventional mass storage systems typically use high performance RAID
clusters as a disk cache, often with a file system interface. The disk cache is backed by tape libraries
which serve as the final repository for data. In mass storage systems where performance is an issue
tape may serve only as a deep archive for disaster recovery purposes. In this case all data is stored
on the disk farm. If a high availability system is required, the data is often duplicated on a separate
system, with a fail-over mechanism controlling access.
This work explores an alternative design using massive arrays of idle disks, or MAID. We argue
that this storage organization provides storage densities matching or exceeding those of tape libraries
with performance similar to disk arrays. Moreover, we show that through a combination of effective
power management of individual drives and the use of cache or migration, this performance can be
achieved using a very small power envelope.
We examine the issues critical to the performance, energy consumption and practicality of sev-eral
classes of MAID systems. The potential of MAID to save energy costs with a relatively small
performance penalty is demonstrated in a comparison with a conventional RAID 0 storage array.
Cooperative Backup System
Sameh Elnikety, Rice University; Mark Lillibridge, Compaq SRC; Mike Burrows, Microsoft Research; and Willy Zwaenepoel, Rice University
This paper presents the design of a novel backup system built on top of a peer-to-peer architecture with
minimal supporting infrastructure. The system can be deployed for both large-scale and small-scale peer-to-peer
overlay networks. It allows computers connected to the Internet to back up their data cooperatively. Each
computer has a set of partner computers and stores its backup data distributively among those partners. In return,
such a way as to achieve both fault-tolerance and high reliability. This form of cooperation poses several
interesting technical challenges because these computers have independent failure modes, do not trust each
other, and are subject to third party attacks.
Federated File Systems for Clusters with Remote Memory
Suresh Gopalakrishnan, Ashok Arumugam, and Liviu Iftode,
We present the design, prototype implementation and
initial evaluation of FedFS - a novel cluster file system
architecture that provides a global file space by aggregating the local file systems of the cluster nodes into
a loose federation. The federated file system (FedFS)
is created ad-hoc for a distributed application that
runs on the cluster, and its lifetime is limited by the
lifetime of the distributed application. FedFS provides location-independent global file naming, load
balancing, and file migration and replication. It relies on the local file systems to perform the file I/O
The local file systems retain their autonomy, in the
sense that their structure and content do not change
to support the federated file system. Other applications may run on the local file systems without realizing that the same file system is part of one or multiple
FedFS. If the distributed application permits, nodes
can dynamically join or leave the federation anytime,
with no modifications required to the local file system
FedFS is implemented as an I/O library over VIA,
which supports remote memory operations. The
applicability and performance of the federated file
system architecture is evaluated by building a distributed NFS file server.
An Iterative Technique for Distilling a Workload's Important Performance Information
Zachary Kurmas, Georgia Tech; Kimberly Keeton, HP Labs
Larger Disk Blocks or Not?
Steve McCarthy, Mike Leis, and Steve Byan, Maxtor Corporation
The recent annual compound growth rate of disk drive areal density has been 100% - a doubling of capacity every year. This growth rate is faster than Moore’s Law - advances in disk technology have been outpacing advances in semiconductor technology. Part of the reason for this spectacular growth rate is that areal density is a two-dimensional problem. Succeeding product generations increase both the number of tracks per inch (TPI) radially and the number of linear bits per inch (BPI) circumferentially. However, both parameters are facing technical challenges that may slow the rate of capacity growth. In this paper, we will briefly examine some of the obstacles to increased BPI and propose an increase in sector size as an aid to surmounting them.
Lazy Parity Update: A Technique to Improve Write I/O Performance of Disk Array Tolerating Double Disk Failures
Young Jin Nam, Dae-Woong Kim, Tae-Young Choe, and Chanik Park, Pohang University of Science and Engineering, Kyungbuk, Republic of Korea
The Armada Framework for Parallel I/O on Computational Grids
Ron Oldfield and David Kotz, Dartmouth College
IBM Storage Tank:
A Distributed Storage System
D. A. Pease, R. M. Rees, W. C. Hineman, D. L. Plantenberg,
R. A. Becker-Szendy, R. Ananthanarayanan, M. Sivan-Zimet,
C. J. Sullivan, IBM Almaden Research Center; R. C. Burns, Johns
Hopkins University; D. D. E. Long, University of California, Santa
IBM Storage Tank is a SAN-based distributed object storage system for use in heterogeneous
environments. It provides performance comparable to that of file systems built on bus-attached,
high-performance storage, as well as advanced storage and data management functions. It is
designed to be highly available and scalable. The Storage Tank project has been underway at
IBM's Almaden Research Center for several years.
Storage Tank is designed to work with any Storage Area Network architecture, as well as with any
SAN storage hardware. (It currently runs on both Fibre Channel and iSCSI SANs.) It is also
designed to be portable to essentially any host system architecture.
This paper provides a high-level overview of Storage Tank's design and features.
Data Placement Based on the Seek Time Analysis of a MEMS-based Storage Device
Zachary N. J. Peterson, Scott A. Brandt, Darrell D. E. Long, University of California, Santa Cruz
Reducing access times to secondary I/O devices has
long been the focus of many systems researchers.
With traditional disk drives, access time is the composition
of transfer time, seek time and rotational latency,
so many techniques as to minimize these factors,
such as ordering I/O requests or intelligently
placing data, have been developed. MEMS-based
storage devices are seen by many as a replacement
or an augmentation for modern disk drives, but algorithms
for reducing access time for MEMS-based
storage are still poorly understood. These devices,
based on MicroElectroMechanical systems (MEMS),
use thousands of active read/write heads working in
parallel on a non-rotating magnetic substrate, eliminating
rotational latency from the access time equation.
This leaves seek time as the dominant variable.
Therefore, new data layout techniques based
on minimizing the unique seek time characteristics
of a MEMS-based storage device can be developed.
This paper begins to examine the access qualities of
a MEMS-based storage device, and based on experimental
simulation, develops an understanding of the
seek time characteristics of such a device. These
characteristics then allow us to identify equivalent
regions in which to place data for improved access.
Logistical Networking Research and the Network Storage Stack
James S. Plank, Micah Beck, and Terry Moore,
University of Tennessee
Enhancing NFS Cross-Administrative Domain Access
Joseph Spadavecchia and Erez Zadok, Stony Brook University
The access model of exporting NFS volumes to clients
suffers from two problems. First, the server depends on
the client to specify the user credentials to use and has
no flexible mechanism to map or restrict the credentials
given by the client. Second, there is no mechanism to
hide data from users who do not have privileges to access
it. Although NFSv4 promises to fix the first problem us-ing
universal identifiers, it does not provide a mechanism
for hiding data and is not expected to be in wide use for
a long time.
We address these problems by a combination of two
solutions. First, range-mapping is a mechanism that allows
the NFS server to restrict and flexibly map the credentials
set by the client. Second, file-cloaking allows the
server to control the data a client is able to view or access
beyond normal Unix semantics. Our design is compatible
with all versions of NFS, including NFSv4. We have
implemented this work in Linux and made changes only
to the NFS server code; client-side NFS and the NFS protocol
remain unchanged. Our evaluation shows a minimal
average performance overhead and, in some cases,
an end-to-end performance improvement.
StorageAgent: An Agent-based Approach for Dynamic Resource Sharing in a Storage Service Provider (SSP) Infrastructure
Sandeep Uttamchandani, IBM Almaden Research Center
In a SSP Infrastructure, the resources of the Storage Server namely cache, memory and CPU are shared in an ad-hoc
manner among the clients. These resources play an important role in determining the overall Throughput and
Latency of data-access. In this paper, we propose StorageAgent: A systematic, secure and efficient approach for
distributing resources. Built on agent-based semantics for dynamic resource sharing, StorageAgent achieves the
following goals. First, there is an efficient utilization of available resources as there are well-defined semantics for
lending and reclaiming resources. Second, security of data is ensured as access to borrowed resources is controlled
solely by trusted-agents. Third, fine-grain control and metering of resources used by individual clients is possible.
Conquest: Better Performance Through a Disk/Persistent-RAM Hybrid File System
An-I A. Wang, Peter Reiher, and Gerald J. Popek, University of California, Los Angeles; Geoffrey H. Kuenning, Harvey Mudd College
Conquest is a disk/persistent-RAM hybrid file system that
is incrementally deployable and realizes most of the benefits
of cheaply abundant persistent RAM. Conquest consists
of two specialized and simplified data paths for in-core
and on-disk storage and outperforms popular disk-based
file systems by 43% to 97%.