Check out the new USENIX Web site.


Seeding Heuristics

The load-picking algorithms in Sections 3.5-3.6 generate a new load given one or more previous test loads. How can the controller generate the first load, or seed, to try? One way is to use a conservative low load as the seed, but this approach increases the time spent ramping up to a high peak rate. When the benchmarking goal is to plot a response surface, the controller uses another approach that uses the peak rate of the ``nearest'' previous sample as the seed.

To illustrate, assume that the factors of interest, $ \langle F_1,
\ldots, F_n \rangle$, in Algorithm 1 are $ \langle$ number of disks, number of nfsds $ \rangle$ (as shown in Figure 2). Suppose the controller uses Binsearch with a low seed of $ 50$ to find the peak rate $ \lambda^{*}_{1,1}$ for sample $ \langle 1, 1 \rangle$. Now, for finding the peak rate $ \lambda^{*}_{1,2}$ for sample $ \langle 1, 2 \rangle$, it can use the peak rate $ \lambda^*_{1,1}$ as seed. Thus, the controller can jump quickly to a load value close to $ \lambda^{*}_{1,2}$.

In the common case, the peak rates for ``nearby'' samples will be close. If they are not, the load-picking algorithms may incur additional cost to recover from a bad seed. The notion of ``nearness'' is not always well defined. While the distance between samples can be measured if the factors are all quantitative, if there are categorical factors--e.g., file system type--the nearest sample may not be well defined. In such cases the controller may use a default seed or an aggregate of peak rates from previous samples to start the search.

varun 2008-05-13