FAST 2002 Abstract
SnapMirror®: File System Based Asynchronous Mirroring
for Disaster Recovery
Hugo Patterson, Stephen Manley, Mike Federwisch, Dave Hitz, Steve Kleiman, Shane Owara,
Network Appliance Inc.
Computerized data has become critical to the survival of
an enterprise. Companies must have a strategy for recovering
their data should a disaster such as a fire destroy the
primary data center. Current mechanisms offer data managers
a stark choice: rely on affordable tape but risk the
loss of a full day of data and face many hours or even
days to recover, or have the benefits of a fully synchronized
on-line remote mirror, but pay steep costs in both
write latency and network bandwidth to maintain the
mirror. In this paper, we argue that asynchronous mirroring,
in which batches of updates are periodically sent to
the remote mirror, can let data managers find a balance
between these extremes. First, by eliminating the write
latency issue, asynchrony greatly reduces the performance
cost of a remote mirror. Second, by storing up
batches of writes, asynchronous mirroring can avoid
sending deleted or overwritten data and thereby reduce
network bandwidth requirements. Data managers can
tune the update frequency to trade network bandwidth
against the potential loss of more data. We present Snap-Mirror,
an asynchronous mirroring technology that leverages
file system snapshots to ensure the consistency
of the remote mirror and optimize data transfer. We use
traces of production filers to show that even updating an
asynchronous mirror every 15 minutes can reduce data
transferred by 30% to 80%. We find that exploiting file
system knowledge of deletions is critical to achieving
any reduction for no-overwrite file systems such as
WAFL and LFS. Experiments on a running system show
that using file system metadata can reduce the time to
identify changed blocks from minutes to seconds compared
to purely logical approaches. Finally, we show that
using SnapMirror to update every 30 minutes increases
the response time of a heavily loaded system only 22%.
dollars depending on the size of the enterprise and the
role of the data. With increasing frequency, companies
are instituting disaster recovery plans to ensure appropriate
data availability in the event of a catastrophic failure
or disaster that destroys a site (e.g. flood, fire, or earthquake).
It is relatively easy to provide redundant server
and storage hardware to protect against the loss of physical
resources. Without the data, however, the redundant
hardware is of little use.
- View the full text of this paper in
PDF. Until January 2003, you will need your USENIX membership identification in order to access the full papers.
The Proceedings are published as a collective work, © 2002 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
- To become a USENIX Member, please see our Membership Information.