Check out the new USENIX Web site. next up previous
Next: Workloads Up: Experimental Set-up Previous: The Overall Software System


Queue Depth and When To Destage

To utilize the full throughput potential of a RAID array or even a single disk, it is crucial to issue multiple concurrent writes. This gives more choice to the scheduling algorithm inside the disks which, by design, usually tries to maximize the throughput without starving any I/Os. Furthermore, in RAID, the number of outstanding concurrent writes roughly dictates the number of disks heads that can be employed in parallel. The number of outstanding concurrent writes constitute a queue. As this queue length increases, both the throughput and the average response time increases. As the queue length increases, the reads suffer, in that, they may have to wait more on an average. We choose a value, MAXQUEUE (say 20), as the maximum of number of concurrent write requests to the disks, where a write request is a set of contiguous pages within one write group.

We now turn our attention to the important decision of ``When to Destage'' that is needed in line 19 of Figure 4. At any time, we dynamically vary the number of outstanding destages in accordance with how full the NVS actually is. We maintain a lowThreshold which is initially set to 80% of the NVS size, and a highThreshold which is initially set to 90% of the NVS size. If the NVS occupancy is below the lowThreshold and we were not destaging sequential write group, we stop all destages. However, if NVS occupancy is below the lowThreshold but the previous destage was marked sequential and the next candidate destage is also marked sequential, then we continue the destaging at a slow and steady rate of 4 outstanding destages at any time. This ensures that sequences are not broken and their spatial locality is exploited completely. Further, this also takes advantage of disks' sequential bandwidth. If NVS occupancy is at or above the highThreshold, then we always go full throttle, that is, destage at the maximum drain rate of MAXQUEUE outstanding write requests. We linearly vary the rate of destage from lowThreshold to highThreshold in a fashion similar to [11]. The more full within this range the NVS gets, the faster the drain rate; in other words, the larger the number of outstanding concurrent writes. Observe that the algorithm will not always use the maximum queue depth. Writing at full throttle regardless of the rate of new writes is generally bad for performance. What is desired is simply to keep up with the incoming write load without filling up NVS. Convexity of throughput versus response time curve indicates that a steady rate of destage is more effective than a lot of destages at one time and very few at another. Dynamically ramping up the number of outstanding concurrent writes to reflect how full NVS is helps to achieve this steady rate. Always using full throttle destage rate leads to abrupt ``start'' and ``stop'' situation, respectively, when the destage threshold is exceeded or reached.

We add one more new idea, namely, we dynamically adapt the highThreshold. Recall that write response times are negligible as long as NVS is empty enough to accommodate incoming requests, and can become quite large if NVS ever becomes full. We adapt the highThreshold to attempt to avoid this undesirable state while maximizing NVS occupancy. We implement a simple adaptive back-off and advance scheme. The lowThreshold is always set to be highThreshold minus 10% of NVS size. We define desiredOccupancyLevel to be 90% of the NVS size. The highThreshold is never allowed to be higher than desiredOccupancyLevel or lower than 10% of NVS size. We maintain a variable called maxOccupancyObserved that keeps the maximum occupancy of the cache since the last time it was reset. Now, if and when the NVS occupancy drops below the current highThreshold, we decrement the highThreshold by any positive difference between maxOccupancyObserved and desiredOccupancyLevel and we reset maxOccupancyObserved to the current occupancy level. We keep a note of the amount of destages that happen between two consecutive resettings of maxOccupancyObserved in the variable resetInterval. Of course, decrementing highThreshold hurts the average occupancy levels in NVS, and reduces spatial as well as temporal locality for writes. Thus, to counteract this decrementing force, if after a sufficient number of destages (say equal to resetInterval) the maxOccupancyObserved is lower than the desiredOccupancyLevel, then we increment highThreshold by the difference between desiredOccupancyLevel and maxOccupancyObserved, and we reset maxOccupancyObserved to the current occupancy level.


next up previous
Next: Workloads Up: Experimental Set-up Previous: The Overall Software System
Binny Gill 2005-10-17