Embedded Inodes and Explicit Grouping: Exploiting Bandwidth for Small FilesBy Gregory R. Ganger and M. Frans Kaashoek, MIT
Summary by Peter Collinson
This paper was the first talk of the refereed track. It was a good start, too, because the authors had won the Best Paper of the Conference Award. All three talks in this opening session were excellent, setting the tone for a very interesting conference.
The work described in this paper looked at C-FFS (Co-locating Fast File System), a new way of improving system performance for small files. Most files on most systems are small, so the topic is of great interest to us all. Although various aspects of disk design have improved over the years, disk access times remain low. Disks move large chunks of data quickly, but are poor at repositioning the disk heads when reading or writing data. For some time, systems have attempted to keep all the parts of files in the same area of a disk so that the fast data transfer times that disks are able to provide can be utilized. The attempts have worked well for large files, but have not provided much improvement for small files.
The authors have used two techniques to improve the performance. The first, embedded inodes, moves most inodes into directories, eliminating a physical level of indirection between the name and the inode. Traditionally, the UNIX directory entry contains a name and an inode number used to index the inode referring to the file on the partition. With C-FFS, the inode number is replaced by the inode itself. C-FFS eliminates the need to read the inode when accessing a file because it is already in memory; it also saves an additional write when creating a file.
Of course, placing the inode into the directory breaks many aspects of UNIX filesystem semantics, and steps have been taken to ensure that the semantics are maintained. Perhaps the most obvious break relates to hard links, which depend on the ability to have directory entries point to the same inode. Greg observed that, in reality, most systems have very few hard links so the solution to the problem doesn't need to be superefficient. In C-FFS, if you create a hard link to an inode in the current directory, then that link can be a pointer. Problems arise only when an inode is referenced from separate directories. This difficulty is handled by creating an inode on the disk that is freestanding, i.e., not in a directory. There are other problems with maintaining traditional UNIX semantics, and I refer you to the paper for solutions to those.
The second technique that is used to improve system performance is known as explicit grouping, placing small files named by a particular directory next to each other so that they can be read into the system as one unit. By exploiting the locality in the namespace in this way, C-FFS can substantially reduce the number of disk requests required for small file workloads. Also, because most inodes are embedded, this grouping can be done more aggressively than would be possible with the Fast File System.
The performance measurements show significant improvements over the McKusick Fast File System, 500% to 700% for small file throughput and 10% to 300% for real software development applications. There is no degradation in the system's ability to deal well with large files, which was an important criterion.
This paper was well presented and, I think, contains some interesting ideas, a worthy winner of the Best Paper Award.
Originally published in ;login: Vol. 22, No.2, April 1997.
Last changed: May 28, 1997 pc