Check out the new USENIX Web site.

Introduction

Looking at industry projections for disk drive technology over the next 7 years, we see a familiar, expected trend: the capacity of individual disk enclosures will continue to double every 9-18 months[10]. File systems have successfully coped with this trend for two decades with relatively minor changes, and we might easily assume that file systems will continue to cope with exponential capacity growth for another two decades. But a closer look at three associated hardware trends sets off alarm bells (see Table 1).


Table 1: Projected disk hardware trends[10].
  2006 2009 2013 Change
Capacity (GB) 500 2000 8000 16x
Bandwidth (Mb/s) 1000 2000 5000 5x
Seek time (ms) 8 7.2 6.5 1.2x


First, disk I/O bandwidth is not keeping up with capacity. Between 2006 and 2013, disk capacity is projected to increase by 16 times, but disk bandwidth by only 5 times. As a result, it will take about 3 times longer to read an entire disk. It's as if your milkshake got 3 times bigger, but the size of your straw stayed the same. Second, seek time will stay almost flat over the next 7 years, improving by a pitiful factor of only 1.2. The performance of workloads with any significant number of seeks (i.e., most workloads) will not scale with the capacity of the disk. Third, the per-bit error rate is not improving as fast as disk capacity is growing. Simply put, every time disk capacity doubles, the per-bit error rate must be cut in half to keep overall errors per disk the same--needless to say, that is not happening. The absolute chance of an error occurring somewhere on a disk increases as the size of the disk grows.

What do these trends mean for file systems? Any operation that is O(size of file system) will take longer to complete--at least three times longer, factoring in only the effect of lower relative bandwidth. Flat seek time will further compromise the ability of the file system to scale to larger disks. The number of disk corruption events per disk will increase due to hardware-related media errors alone. The end result is that file system repair will take longer at the same that it also becomes more frequent--what we call the ``fsck time crunch.''

Our proposed solution, chunkfs, divides up the on-disk file system format into individually repairable chunks with strong fault isolation boundaries. Each chunk can be individually checked and repaired with only occasional, limited references to data outside of itself. Cross-chunk references, e.g., for files larger than a single chunk, are rare and follow strict rules which speed up consistency checks, such as requiring both forward and back pointers. Our measurements show that write activity already tends to be concentrated in relatively small subsets of the disk at any given time, making on-line checking, repair, and defragmentation of idle chunks exceptionally fast and simple.

Valerie Henson 2006-10-18