Check out the new USENIX Web site. next up previous
Next: Effect of Disk Utilization Up: Experimental Results Previous: Small File Performance

Large File Performance

 
figure590
Figure 7:   Large file performance. The benchmark sequentially writes a 10 MB file, reads it back sequentially, writes it again randomly (both asynchronously and synchronously for the UFS runs), reads it again sequentially, and finally reads it randomly.

In the second benchmark, we write a 10 MB file sequentially, read it back sequentially, write 10 MB of data randomly to the same file, read it back sequentially again, and finally read 10 MB of random data from the file. The performance of random I/O can also be an indication of the effect of interleaving a large number of independent concurrent workloads. The benchmark is again run on empty disks. Figure 7 shows the results. The writes are asynchronous with the exception of the two random write runs on UFS that are labeled as ``Sync''. Neither the LFS cleaner nor the VLD compactor is run.

We first point out a few characteristics that are results of implementation artifacts. The first two phases of the LFS performance are not as good as those of UFS because the user level LFS implementation is less efficient than the native in-kernel UFS. LFS also disables prefetching, which explains its low sequential read bandwidth. Sequential reads on UFS run much faster than sequential writes on the regular disk due to aggressive prefetching both at the file system level and inside the disk. With these artifacts explained, we now examine a number of interesting performance characteristics.

First, sequential read after random write performs poorly in all LFS and VLD systems because both logging and eager writing destroy spatial locality. This is a problem that may be solved by a combination of caching, data reorganization, hints at interfaces, and prefetching as explained in Section 3.4.

Second, the LFS random write bandwidth is higher than that of sequential write. This is because during the random write phase, some of the blocks are written multiple times and fewer bytes reach the disk. This is a benefit of delayed writes.

Third, on a UFS, while it is not surprising that the synchronous random writes do well on the VLD, it is interesting to note that even sequential writes perform better on the VLD. This is because of the occasional inadvertent miss of disk rotations on the regular disk. Interestingly enough, this phenomenon does not occur on the slower HP97560 disk. This evidence supports our earlier contention that tuning the host operating system to match changing technologies is indeed a difficult task. The approach of running the file system inside the disk in general and the concept of a virtual log in particular can simplify such efforts.

Fourth, although our regular disk simulator does not implement disk queue sorting, UFS does sort the asynchronous random writes when flushing to disk. Therefore, the performance of this phase of the benchmark, which is also worse on the regular disk than on the VLD due to the reason described above, is a best case scenario of what disk queue sorting can accomplish. In general, disk queue sorting is likely to be even less effective when the disk queue length is short compared to the working set size. Similar phenomenon can happen for a write-ahead logging system whose log is small compared to the size of the database. The VLD based systems need not suffer from these limitations. In summary, the benchmark further demonstrates the power of combining lazy writing by the file system with eager writing by the disk.


next up previous
Next: Effect of Disk Utilization Up: Experimental Results Previous: Small File Performance

Randolph Wang
Tue Jan 5 14:30:32 PST 1999