Check out the new USENIX Web site.

Continuation inode implementation notes

The implementation of continuation inodes must avoid some obvious pitfalls. First, we must avoid excessive numbers of continuation inodes, both in terms of the total number in the file system and number per file. One especially painful scenario is what we call the doubling back problem, which will happen when file grows while free space moves around the file system due to other activity. We may end up beginning allocation in chunk A, moving to chunk B, and then returning to chunk A because more space has been freed up in the meantime. If we are not careful, we could end up with one data block per inode.

Our solution is to implement sparse files and allow each continuation inode to encompass an arbitrary set of blocks in the file. This results in a maximum overhead of one continuation inode per file per chunk, which is the lowest upper bound theoretically possible (consider the case of a single file which fills the entire file system).

In order to make intra-file seek times reasonable, we may have to implement a lookup structure of some sort, depending on how fragmented files end up being in typical usage. The doubly linked list structure in each inode could be easily replaced with a tree node or hash structure to speed offset location.

The size of chunks should be small to speed up per-chunk fsck, but large enough that the cross-chunk pass is still fast (i.e., continuation inodes should be rare). By definition, large files requiring continuation inodes are rare because you cannot store many large files before you run out of space on the file system. A quick survey of several Linux developers' laptop file systems during the 2006 Linux file systems workshop[6] found that 99.5% or more of files were less than 1MB in size. We guesstimate that an optimal chunk size would be on the order of 1/100th of the total file system size. With regard to multiple hard links, keep in mind that the vast majority of files have exactly one link and will be allocated in the same chunk as the parent directory. Our informal laptop file system survey found that only about 1% of file had more than one link.

Valerie Henson 2006-10-18