Measuring Predictive Prefetching

Next: Andrew Benchmark Up: Evaluating Predictive Prefetching Previous: Evaluating Predictive Prefetching

Measuring Predictive Prefetching

Accurately measuring the effectiveness of predictive prefetching presented a significant problem in itself. Most file system benchmarks such as PostMark [9] use a randomly generated workload. Since our work is based on the observation that file accesses patterns are not random these benchmarks offer little potential for measuring predictive prefecthing. In fact, many researchers [6,12,13,23,16,3,18,25] have shown that this random workload incorrectly represents file system activity.

Previously [13], we used traces of file system activity over a one month period from four different machines to show that PCM based predictions can predict the next access with an accuracy of 0.82. Across the four traces the accuracy measures ranged from 0.78-0.88. These four traces were chosen to represent the most diverse set of I/O characteristics from the 33 different machines traced. Even with the widest range of I/O characteristics possible the one characteristic that was uniform across all traces was predictability. Unfortunately, most existing benchmarks lack any such predictability.

Replaying our traces on a live system was another method we considered for testing predictive prefetching. While these traces did contain a record of all system calls, page fault data was not recorded. Unfortunately one common source for I/O requests is page faults that result from memory mapped executables and data files. As a result, an application based benchmark which consisted of executing specific programs (and the associated page faults) would more accurately represent a realistic file system workload.

For these reasons we choose to use application based benchmarks to provide a basic but realistic measure of how well predictive prefetching would do under some well defined conditions. While these benchmarks don't represent a real world workload, they do provide a workload that is more realistic than that of random file access benchmarks or replayed traces. To provide enough data samples to obtain confidence intervals of our measures we ran each benchmark 20 times. While such repetition lacks the additional variety that would occur in many real world workloads, this workload is similar to those seen by a nightly build process or the traversal of a set of data files (e.g. indexing of man pages).

Finally, we should note that predictive prefetching suffers from the same compulsory miss problems that an LRU cache does. Specifically, if our system hasn't previously seen an access pattern then there is no way it can recognize that pattern, predict a file's access and prefetch the file's data. This means that any meaningful benchmark must see the given pattern at least once before it can recognize it. As a result we must train on an access pattern to a set of files before we can meaningfully test predictive prefetching over that pattern. Our SSH benchmark addresses this concern by changing the source code base across several versions without any re-training. Thus measuring the performance of our predictive prefetching system over a changing code base.

Next: Andrew Benchmark Up: Evaluating Predictive Prefetching Previous: Evaluating Predictive Prefetching

Tom M. Kroeger
2001-05-01