Seeding Heuristics

Seeding Heuristics

The load-picking algorithms in Sections 3.5-3.6 generate a new load given one or more previous test loads. How can the controller generate the first load, or seed, to try? One way is to use a conservative low load as the seed, but this approach increases the time spent ramping up to a high peak rate. When the benchmarking goal is to plot a response surface, the controller uses another approach that uses the peak rate of the ``nearest'' previous sample as the seed.

To illustrate, assume that the factors of interest, $\langle F_1, \ldots, F_n \rangle$ , in Algorithm 1 are $\langle$ number of disks, number of nfsds $\rangle$ (as shown in Figure 2). Suppose the controller uses Binsearch with a low seed of to find the peak rate $\lambda^{*}_{1,1}$ for sample $\langle 1, 1 \rangle$ . Now, for finding the peak rate $\lambda^{*}_{1,2}$ for sample $\langle 1, 2 \rangle$ , it can use the peak rate $\lambda^*_{1,1}$ as seed. Thus, the controller can jump quickly to a load value close to $\lambda^{*}_{1,2}$ .

In the common case, the peak rates for ``nearby'' samples will be close. If they are not, the load-picking algorithms may incur additional cost to recover from a bad seed. The notion of ``nearness'' is not always well defined. While the distance between samples can be measured if the factors are all quantitative, if there are categorical factors--e.g., file system type--the nearest sample may not be well defined. In such cases the controller may use a default seed or an aggregate of peak rates from previous samples to start the search.

varun 2008-05-13