Check out the new USENIX Web site.

Continuation inodes

The first problem we encounter is how to store files larger than the space available in a chunk. We could allocate blocks from another chunk and have direct pointers to them from our chunk, but that would make file system repair no longer local to the chunk. To understand why, think about repairing the block allocation bitmap. If only inodes inside a chunk can have pointers to blocks in that chunk, then we only have to check the inodes inside the chunk to determine whether a particular block is truly allocated or not. But if we allow inodes in other chunks to have pointers to blocks in our chunk, we have to check all inodes in all chunks and we are back to square one.

Our solution to this problem is called continuation inodes. We pre-allocate forward and back pointers in every inode. When a file outgrows a chunk, we allocate another inode in another chunk, mark it as a continuation inode, and fill out the forward and back pointers to create a doubly linked list. Then we allocate blocks from the continuation inode's chunk and point to them from the continuation inode. Continuation inodes are checked in an additional final pass of fsck, to check that the forward and back pointers agree. Directories that outgrow their chunk are handled in the same way as files.

The next obvious problem is hard links between directories and files in different chunks. In order for fsck to quickly check the number of links to an inode, the inode and all directory entries referencing it must be in the same chunk. While we prefer to allocate directories and the files they link to in the same chunk, eventually we have to allocate from a different chunk, creating a problem for our goal of keeping link count calculation local to a chunk. Another more direct source of cross-chunk hard links are files with hard links from multiple directories in different chunks--clearly the inode can't be in more than one chunk.

Our solution is to allocate a continuation inode for the linked-to inode in the same chunk as the directory. If the directory's chunk is full, we allocate a continuation inode for both directory and inode in a chunk with free space. An inode's link count is the sum of the link counts of all its continuation inodes (this link count can be cached in the ``parent'' inode).

Valerie Henson 2006-10-18