Check out the new USENIX Web site.

Chunkfs: Using divide-and-conquer to improve file system reliability and repair

Val Henson

Open Source Technology Center

Intel Corporation

Arjan van de Ven

Open Source Technology Center

Intel Corporation

Amit Gud

Kansas State University

gud@cis.ksu.edu

Zach Brown

Oracle, Inc.

zach.brown@oracle.com

Abstract:

The absolute time required to check and repair a file system is increasing because disk capacities are growing faster than disk bandwidth and seek time remains almost unchanged. At the same time, file system repair is becoming more common, because the per-bit error rate of disks is not dropping as fast as the number of bits per disk is growing, resulting in more errors per disk. With existing file systems, a single corrupted metadata block requires the entire file system to be unmounted, checked, and repaired--a process that takes hours or days to complete, during which time the data is completely unavailable. The resulting ``fsck time crunch'' is already making file systems only a few terabytes in size impractical to administrate. We propose chunkfs, which divides on-disk file system data into small, individually repairable fault-isolation domains while preserving normal file system semantics.



Valerie Henson 2006-10-18