What to Prefetch

Next: The Problem of Cache Up: Introduction Previous: When is Prefetching Useful

What to Prefetch

The most common prefetching approach is to perform sequential readahead. The simplest form is One Block Lookahead (OBL), where we prefetch one block beyond the requested block [18]. OBL can be of three types: (i) always prefetch - prefetch the next block on each reference, (ii) prefetch on miss - prefetch the next block only on a miss, (iii) tagged prefetch - prefetch the next block only if the referenced block is accessed for the first time. P-Block Lookahead extends the idea of OBL by prefetching

blocks instead of one, where

is also referred to as the degree of prefetch. Dahlgren [19] proposed a version of the P-Block Lookahead algorithm which dynamically adapts the degree of prefetch for the workload. Tcheun [20] suggested a per stream scheme which selects the appropriate degree of prefetch on each miss based on a prefetch degree selector (PDS) table. For the case where cache is abundant, Infinite-Block Lookahead has also been studied [21].

Stride-based prefetching has also been studied mainly for processor caches where strides are detected based on information provided by the application [22], a lookahead into the instruction stream [23], or a reference prediction table indexed by the program counter [24]. Dahlgren [25] found that sequential prefetching is a better choice because most strides lie within the block size and it can also exploit locality.

History-based prefetching has been proposed in various forms. Grimsrud [26] uses a history-based table to predict the next pages to prefetch. Prefetching using Markov predictors has been studied in [27], wherein multiple memory predictions are prefetched at the same time. Data compression techniques have also been applied to predict future access patterns [12]. Vitter [28] provided an optimal (in terms of the miss ratio) prefetching technique based on the Lempel-Ziv algorithm. Lei [29] suggested a file prefetching technique based on historical access correlations maintained in the form of access trees.

The fact is, most commercial data storage systems use very simple prefetching schemes like sequential prefetching. This is because only sequential prefetching can achieve a high long-term predictive accuracy in data servers. Strides that cross page or track boundaries are uncommon in workloads and therefore not worth implementing. History-based prefetching suffers from low predictive accuracy and the associated cost of the extra reads on an already bottlenecked I/O system. The data storage system cannot use most hardware-initiated or software-initiated prefetching techniques as the applications typically run on external hardware. Further, offline algorithms [30,31,32,33] are not applicable as they require knowledge of future data accesses.

Next: The Problem of Cache Up: Introduction Previous: When is Prefetching Useful

root 2006-12-19