Check out the new USENIX Web site. next up previous
Next: AMP Algorithm Up: AMP Previous: Replacement Policy for Prefetched


Theoretical Analysis: Optimality Criteria

In this section we theoretically analyze the case when an LRU cache is shared by prefetched data for multiple steady sequential streams. We assume an AA algorithm operating with a cache of size $ C$. $ L$ is defined as the average life of a page in the cache. Each stream has a maximum request rate, $ r$, which is achieved when all read requests are hits in the cache. The aggregate throughput, $ B$, is the sum of individual stream throughputs ( $ B = \sum_{i} b_i$).

Figure 2: Starting at $ 3$ ms, the average response time grows linearly with the read size. We observed similar behavior for RAID-$ 5$ implying that the optimality criteria in this paper apply to RAID as well.
\begin{figure}\begin{center}
{\small Average Read Response Time Vs. Read Size }
\epsfig{figure=numbers/pxt_blocksps.eps, width=3.0in}
\end{center}\end{figure}

Observation 3.1   $ p/t(p)$ is a monotonically non-decreasing function, where $ t(p)$ is the average time to prefetch $ p$ pages.

Proof. From Figure 2 we observe that $ t(p)$ is of the form $ kp + c$. Hence, $ p/t(p) = \frac{p}{kp+c}$, and its slope

$\displaystyle \frac{db}{dp} = \frac{c}{(kp +c)^2}$

is positive since $ c$ is positive. $ \qedsymbol$

Definition 3.1   A stream is said to be satisfied at $ (p,g)$, if it experiences no misses for the given $ p$ and $ g$.

Lemma 3.1   A stream is satisfied at $ (p,g)$ iff $ t(p) \leq (g+1)/r$ .

Proof. By definition, if a stream is satisfied at $ (p,g)$ then it experiences no misses in the steady state. That implies that the prefetch issued at the trigger distance $ g$ completes before the $ g + 1$ pages are read by the client. Therefore, $ t(p) \leq (g+1)/r$, where $ r$ is the request rate of the stream. The reverse is also true. If the time it takes to prefetch $ p$ pages is not more than the time it takes the client to read the $ g$ pages, then there cannot be any misses in the steady state implying that the stream is satisfied. $ \qedsymbol$

Observation 3.2   The throughput of a satisfied stream is equal to $ r$, its request rate.

Proof. By definition, a satisfied stream experiences no misses in the steady state. No misses implies no stall time and the stream proceeds at its desired request rate of $ r$. $ \qedsymbol$

Lemma 3.2   Cache Pollution occurs if $ (g+1) > \lceil r\cdot t(p) \rceil$.

Proof. If $ g+1 > \lceil r\cdot t(p) \rceil$ then $ g \geq r\cdot t(p)$ or
$ t(p) \leq g/r < (g+1)/r$.
By Lemma III.1, the stream is satisfied at the chosen $ (p,g)$ but is also satisfied for $ g-1$ at $ (p,g-1)$. By Observation III.2, the throughput with $ g-1$ will remain the same as the stream will remain satisfied. However, the cost of the case where a lower $ g$ is used is smaller as the average number of pages that have to be kept in the cache is smaller. Hence, cache space is being wasted without any gain in throughput. $ \qedsymbol$

Lemma 3.3   If there is no cache pollution, wastage occurs iff $ p/r > L$.

Proof. If there is no cache pollution, $ (g+1) \leq \lceil r\cdot t(p) \rceil$ (Lemma III.2). By their definitions, $ p$ pages are requested when $ g + 1$ pages from the previous prefetch are still unaccessed. The number of these pages that are consumed in the time it takes to prefetch is $ r\cdot t(p)$, which is roughly all of the unaccessed pages. Hence, as soon as the next $ p$ pages are prefetched, they begin to be consumed at the request rate $ r$. Therefore, the time it takes for the $ p$ pages to be consumed after prefetch is $ p/r$. Now, if the average life ($ L$) of pages in the cache is less than $ p/r$, then some of the prefetched pages will be evicted before they are requested. Conversely, if $ L$ is greater than $ p/r$ then all the $ p$ pages will be requested before they reach the LRU end of the list and face eviction. $ \qedsymbol$

Lemma 3.4   If there is no cache pollution, throughput of a stream ($ b$) = $ min(r, \frac{p}{t(p)}, \frac{r\cdot L}{t(p)})$.

Proof. The throughput of a stream cannot exceed its request rate ($ r$). Further, since we use a single outstanding prefetch for a stream at any time, the throughput cannot exceed the amount prefetched ($ p$) divided by the time it takes to prefetch that amount $ t(p)$. In the case where $ p > r\cdot L$, wastage occurs (Lemma III.3) and only $ r\cdot L$ pages out of $ p$ will be accessed before being evicted. In this case, the throughput is limited by $ r\cdot L/t(p)$. $ \qedsymbol$

Lemma 3.5   If there is no wastage $ B\cdot L = C$

Proof. The life of the cache ($ L$) is equal to the time it takes to insert $ C$ new pages in the top end of the cache. If there is no wastage, the rate of insertion in the cache is equal to the aggregate read throughput of all the streams ($ B$). Therefore, $ C/B = L$. $ \qedsymbol$

Lemma 3.6   For a fixed choice of $ p_1,p_2,...,p_n$, and cache of size $ C$, the aggregate throughput ($ B$) is unique when there is no wastage.

Proof. Suppose, the aggregate throughput($ B$) was not unique. Without loss of generality, we would have $ B' > B$, such that

$\displaystyle B' = \sum_{i = 1..n} min(r_i, p_i/t(p_i), r_i\cdot L'/t(p_i))$

$\displaystyle B = \sum_{i = 1..n} min(r_i, p_i/t(p_i), r_i\cdot L/t(p_i))$

Since there is no wastage, we have $ p \leq r\cdot L$ for all streams. So, the only different term in the $ min$ expression is not significant. Thus, $ B' = B$, which is contrary to our assumption. $ \qedsymbol$

Theorem 3.1   The aggregate throughput of $ n$ streams sharing a cache of size $ C$ with average cache life $ L$, is maximized if $ \forall i, p_i = \lfloor r_i\cdot L \rfloor$.

Proof. Given $ n$ streams with request rates of $ r_1, r_2,...r_n$ and a cache of size $ C$, let the throughput obtained by the choice: $ \forall i, p^T_i = \lfloor r_i\cdot L \rfloor$ be $ B_T$. The theorem claims that $ B_T$ is the maximum aggregate bandwidth obtainable ($ B_{max}$) through any choice of $ p^T_i$.

We will prove by contradiction. Let us assume that $ B_T < B_{max}$. Therefore, there exists some choice of $ p_1,p_2,...,p_n$ such that the aggregate bandwidth of all streams is $ B_{max}$.

Since wastage and cache pollution can never increase aggregate throughput, we assume, without loss of generality, that $ B_{max}$ is free from these inefficiencies.

If the choice of $ p_i$ is the same as that specified by this theorem, then by Lemma III.6, $ B_T = B_{max}$, which is contrary to our assumption.

% latex2html id marker 5861
$\displaystyle \therefore \exists i: p_i \neq \lfloor r_i\cdot L \rfloor$ (1)

By Lemma III.3, it must be the case for $ B_{max}$ that

$\displaystyle \forall i, p_i \leq \lfloor r_i\cdot L \rfloor$ (2)

If follows from (1) and (2), without loss of generality, that $ p_1 < \lfloor r_1\cdot L \rfloor$.

Let us define a new set: $ p^1_1,p^1_2,...,p^1_n$, where $ p^1_1 = \lfloor r_i\cdot L^1 \rfloor$, and $ \forall i \neq 1, p^1_i = p_i$. $ L^1$ and $ B^1$ are the new cache life and aggregate throughput values.

Since $ B^1 \leq B_{max}$ (by defn. of $ B_{max}$), $ L^1 \geq L$ (Lemma III.5). By Lemma III.4, $ \forall i \neq 1, b^1_i \geq b_i$ as $ p^1_i = p_i$ and $ L^1 \geq L$. By Observation III.1 and Lemma III.4, $ b^1_1 \geq b_1$ as $ p^1_1 > p_i$.

% latex2html id marker 5895
$\displaystyle \therefore B^1 = \sum b^1_i \geq \sum b_i = B_{max}$

Since $ B^1 \leq B_{max}$, it follows that $ B^1 = B_{max}$.

By repeating the above procedure for every stream with $ p_i < \lfloor r_i\cdot L \rfloor$, we will arrive at a set $ p^n_1,p^n_2,...,p^n_n$, where $ \forall i, p^n_i = \lfloor r_i\cdot L^n \rfloor $ and $ B^n = B_{max}$.

Since, the choice of $ p$ for each stream will then be the same for $ B^n$ and $ B_T$, $ B^n = B_T$ by Lemma III.6.

% latex2html id marker 5917
$\displaystyle \therefore B^n = B_{max} = B_T $

which contradicts our assumption that $ B_T < B_{max}$. $ \qedsymbol$


next up previous
Next: AMP Algorithm Up: AMP Previous: Replacement Policy for Prefetched
root 2006-12-19