Check out the new USENIX Web site. next up previous
Next: Outline of the Paper Up: Introduction Previous: The Problem of Wasted


Our Contributions

A prefetching algorithm can have a fixed or adaptive degree of prefetch and can be either asynchronous (when it can prefetch on a hit) or synchronous (when it can prefetch only on a miss). This naturally leads to four classes which we call Fixed Synchronous (FS), Fixed Asynchronous (FA), Adaptive Synchronous (AS), and Adaptive Asynchronous (AA).

Although sequential and non-sequential data typically occupy the same cache, it is worthwhile to examine the prefetched data alone as most known prefetching algorithms suffer from cache pollution and prefetch wastage, and thus, can be improved.

We examine the case where an LRU (Least Recently Used) cache houses prefetched data for multiple concurrent sequential streams. We provide a theoretical analysis and prove the sufficient conditions for optimal online cache management for steady-state sequential streams. We also provide a simple implementation called AMP, the first member of the AA class which optimally adapts both the degree of prefetch and the timing thereof according to the workload and cache size constraints.

With a theoretically optimal design, AMP minimizes prefetch wastage and cache pollution within the prefetched data while maximizing the aggregate throughput achieved by the sequential streams. To demonstrate the effectiveness of AMP, we compare it with $ 9$ other prefetching algorithms including the best representatives from the FA, FS, and AS classes, over a wide range of cache sizes, request rates, request sizes, number of concurrent streams, and workloads.

We observe that AMP convincingly outperforms all the FA, FS, and AS algorithms for any number of streams, and over all cache sizes. In an experiment with a $ 100$ concurrent sequential streams and varying cache sizes, AMP beats the FA, FS, and AS algorithms by $ 29$-$ 172$%, $ 12$-$ 24$%, and $ 21$-$ 210$% respectively while outperforming no prefetching and OBL by a factor of $ 8$. AMP is consistently the best performing algorithm in both the small cache and large cache scenarios, even for complex workloads like SPC1-Read. For SPC2 Video-on-Demand workload, AMP can support at least $ 25$% more streams than the next best algorithm. For streams with short sequences, as well, for which optimality is more elusive, AMP surpasses all the other contenders in its overall performance.


next up previous
Next: Outline of the Paper Up: Introduction Previous: The Problem of Wasted
root 2006-12-19