Next: Online-Optimality of GreedyDual-Size Up: Cost-Aware WWW Proxy Caching Previous: Locality in Web Accesses

GreedyDual-Size Algorithm

The original GreedyDual algorithm is proposed by Young [You91b]. It is actually a range of algorithms, but we focus one particular version which is a generalization of LRU. It is concerned with the case when pages in a cache have the same size, but incur different costs to fetch from a secondary storage. The algorithm associates a value, H, with each cached page p. Initially, when a page is brought into cache, H is set to be the cost of bringing the page into the cache (the cost is always non-negative). When a replacement needs to be made, the page with the lowest H value is replaced, and then all pages reduce their H values by the minimum value of H over all the pages in the cache. If a page is accessed, its H value is restored to the cost of bringing it into the cache. Thus, the H values of recently accessed pages retain a larger portion of the original cost than those of pages that have not been accessed for a long time. By reducing the H values as time goes on and restoring them upon access, the algorithm integrates the locality and cost concerns in a seamless fashion.

To incorporate the difference sizes of the document, we extend the GreedyDual algorithm by setting H to cost/size upon an access to a document, where cost is the cost of bringing the document, and size is the size of the document in bytes. We called the extended version the GreedyDual-Size algorithm. The definition of cost depends on the goal of the replacement algorithm: cost is set to 1 if the goal is to maximize hit ratio, it is set to the downloading latency if the goal is to minimize average latency, and it is set to the network cost if the goal is to minimize the total cost.

At the first glance, GreedyDual-Size would require k subtractions when a replacement is made, where k is the number of documents in cache. However, a different way of recording H removes these subtractions. The idea is to keep an ``inflation'' value L, and let all future setting of H be offset by L. Figure 5 shows an efficient implementation of the algorithm.

Algorithm GreedyDual

Initialize L to 0. Process each request document in turn. The current request is for document p.

If p is already in the cache, set H(p) to L + c(p)/s(p).
If p is not in the cache,
- While there is not enough room in the cache for p,
  - Let L be the minimum H(q) over all pages q in the cache.
  - Evict q such that H(q) = L.
  - Bring p into the cache and set H(p) to be L+c(p)/s(p)

Figure 5: GreedyDual Algorithm.

Using this technique, GreedyDual-Size can be implemented by maintaining a priority queue on the documents, based on their H value. Handling a hit requires O(log k) time and handling an eviction requires O(log k) time, since in both cases a single item in the queue must be updated. More efficient implementations can be designed that make the common case of handling a hit requiring only O(1) time.

Proof.

Next: Online-Optimality of GreedyDual-Size Up: Cost-Aware WWW Proxy Caching Previous: Locality in Web Accesses

Pei Cao
Thu Oct 23 18:04:42 CDT 1997