On-line partial fsck

On-line partial fsck

At any given time, a relatively small subset of a file system is write-busy--that is, the metadata is being modified. This is partly because most file systems try to keep writes grouped on disk for better performance, and partly because disks simply aren't capable of writing to the entire platter at once. We measured the distribution of metadata updates to the file system by instrumenting ext2 to record which block groups had metadata updates using a one-second time slice. We found that over a period of 50 minutes of active use of the file system on a development laptop doing web-browsing, file editing, and kernel compilation, all block groups were clean 98-100% of the time. While filling up the file system as quickly as possible with an artificially constructed workload using both dd and cp -r, all block groups were clean at least 75% of the time, and most were clean far more often. While laptop disks are not particularly high performance, these results confirm our intuition that metadata updates tend to be localized in both time and space. In other words, only a few block groups are being actively modified at any given time.

We predict that many chunks will be idle with respect to metadata writes most of the time. We can take advantage of this to incrementally check chunks on-line. Chunks that are too busy to check while on-line (a relatively small subset) can be quickly and completely checked at the next mount. We implemented a ``dirty bit'' indicating whether a file system is being currently modified for the ext2 file system as a proof of concept and found it to be a relatively easy task[7]. On-line repair will be more difficult and will require careful handling of open files and management of kernel structures, and may not be worth solving if the file system need be off-line for only a few minutes to complete the repair. In addition, in the event of a crash, the dirty bits indicate which chunks need to be recovered and which we can skip, shortening recovery time significantly. We can also randomly check a few chunks at every mount; over time you check everything while the incremental price is low. This kind of scrubbing is especially important given the prevalence of latent (invisible) faults and their effects on long-term data preservation[3].

Valerie Henson 2006-10-18