Check out the new USENIX Web site.


Handling Bursts

A well-known characteristic of many IO workloads is a bursty arrival pattern--fluctuating resource demand due to device and application characteristics, access locality, and other factors. A high degree of burstiness makes it difficult to provide low latency and achieve proportionate allocation.

In our environment, bursty arrivals generally occur at two distinct time scales: systematic long-term ON-OFF behavior of VMs, and sudden short-term spikes in IO workloads. To handle long-term bursts, we modify the $ \beta $ value for a host based on the utilization of queue slots by its resident VMs. Recall that the host-level parameter $ \beta $ is proportional to the sum of shares of all VMs (if $ s_i$ are the shares assigned to VM $ i$ , then for host $ h$ , $ \beta_h = K\times
\sum_{i} s_i$ , where $ K$ is a normalization constant).

To adjust $ \beta $ , we measure the average number of outstanding IOs per VM, $ n_k$ , and each VM's share of its host window size as $ w_k$ , expressed as:

$\displaystyle \vspace*{-0.05in} w_k = \frac{s_k}{\sum_{i} s_i}w(t)\ $ (4)

If ($ n_k < w_k$ ), we scale the shares of the VM to be $ s_i' = n_k \times s_k / w_k$ and use this to calculate $ \beta $ for the host. Thus if a VM is not fully utilizing its window size, we reduce the $ \beta $ value of its host, so other VMs on the same host do not benefit disproportionately due to the under-utilized shares of a colocated idle VM. In general, when one or more VMs become idle, the control mechanism will allow all hosts (and thus all VMs) to proportionally increase their window sizes and exploit the spare capacity.

For short-term fluctuations, we use a burst-aware local scheduler. This scheduler allows VMs to accumulate a bounded number of credits while idle, and then schedule requests in bursts once the VM becomes active. This also improves overall IO efficiency, since requests from a single VM typically exhibit some locality. A number of schedulers support bursty allocations [13,22,6]. Our implementation uses SFQ as the local scheduler, but allows a bounded number of IOs to be batched from each VM instead of switching among VMs purely based on their SFQ request tags.

Ajay Gulati 2009-01-14