Check out the new USENIX Web site. next up previous
Next: Outline of the Paper Up: Introduction Previous: A Fundamental Decision: What

Our Contributions

In read caches, the cost of evicting any page from the cache is the same and very small. Hence, the objective of a read cache is simple: to minimize the miss ratio. In contrast, we propose that the performance of a write destage algorithm depends upon two factors: (i) the total number of destages to disks, namely, the write miss ratio and (ii) the average cost of each destage. Roughly speaking, the objective of write destage algorithms is to minimize the product of the above two terms. Minimizing this product would result in the highest peak write throughput in absence of any reads. Even in presence of concurrent reads, this product attempts to minimize the amount of time that the disk heads are occupied in serving writes leading to minimizing the average read response time, while maximizing aggregate throughput.

To minimize the first term, the algorithm should exploit temporal locality, namely, should destage data that is least likely to be written to amongst all data in the write cache. To minimize the second term, the algorithm should exploit spatial locality and should destage data that are closer on the disks together so as to exploit the position of the heads and the geometry of the disks in the system for higher throughput.

Classically, in read caches, LRU (least recently used) policy has been used to exploit temporal locality. The analogous policy for exploiting temporal locality in writes is known as LRW that destages the least-recently written page [18,19,20].

Algorithms for exploiting spatial locality have been studied in the context of disk scheduling. Many such algorithms require a detailed knowledge of the instantaneous position of the disk head, and exploit the precise knowledge of the location of each data relative to the disk head. In this paper, we are working in the context of a storage controller from which most of the disk parameters are hidden by the underlying RAID and disks. Hence, generally, speaking, spatial locality is hard to exploit at such upper memory levels, see, for example, [21]. We argue, however, that one of the algorithms, namely, CSCAN, that destages data in the ascending order of the logical addresses, can be reasonably successfully applied at even upper levels of memory hierarchy. We empirically demonstrate that as the size of NVS managed by CSCAN increases, the throughput of the system increases for a wide variety of workloads.

The destage order suggested by LRW and CSCAN are generally different, hence, it is only possible to exploit temporal locality or spatial locality, but not both. To emphasize, one reduces number of disk seeks which is the goal of caching, while the other reduces the cost of each disk seek which is the goal of scheduling. This brings us to the main focus of this paper: a novel algorithm that combines both temporal and spatial locality. As our main contribution, we combine an approximation to LRU, namely, CLOCK, with CSCAN to construct a new, simple-to-implement algorithm, namely, Wise Ordering for Writes (WOW), that effectively achieves this goal. The key new idea is to maintain a recency bit akin to CLOCK in CSCAN, and to skip destages of data that has been recently written to.

To demonstrate effectiveness of WOW, we used the following hardware, storage, and workload. The hardware was an IBM xSeries 345 dual processor server running Linux equipped with 4GB RAM that was used as NVS. The storage consisted of 5 disks. Using software RAID, we created a RAID-5 array in 4 data disks + 1 parity disk configuration. We also created a RAID-10 array in 2 + 2 configuration. As the workload, we employed an earlier implementation of the Storage Performance Council's SPC-1 benchmark that is now extremely widely used by many vendors of storage systems [16,17]. We refer to our workload as SPC-1 Like. In a set-up with a high degree of temporal locality (``a cache-sensitive configuration''), WOW delivers peak throughput that is 129% higher than CSCAN and 9% higher than LRW. In a set-up with very little temporal locality (``a cache-insensitive configuration''), WOW and CSCAN deliver peak throughput that is 50% higher than LRW. For a random write workload with nearly 100% misses, on RAID-10, with a cache size of 64K, 4KB pages, WOW and CSCAN deliver peak throughput that is 200% higher than LRW. Similarly, for the random write workload with nearly 100% misses, on RAID-5, with a cache size of 16K, 4KB pages, WOW and CSCAN deliver peak throughput that is 147% higher than LRW. In summary, WOW has better or comparable peak throughput to the best of CSCAN and LRW across a wide gamut of write cache sizes and workload configurations. In addition, even at lower throughputs, WOW has lower average response times than CSCAN and LRW. In another experiment, using SPC-1 Like workload, as cache size is varied, we explore both cache-insensitive and cache-sensitive regimes. We clearly show that CSCAN is good for cache-insensitive regimes, LRW is good for cache-sensitive regimes, whereas WOW is evergreen and is good across the whole range of cache sizes. In summary, WOW is a practical algorithm that fundamentally enhances the capacity of a storage controller to perform more I/Os.

next up previous
Next: Outline of the Paper Up: Introduction Previous: A Fundamental Decision: What
Binny Gill 2005-10-17