Cost for Mapping Response Surfaces

We also observe that Binsearch, Binsearch with Seeding, and Linear with Seeding are robust across the workloads, but the model-guided policy is unstable. This is not surprising given that the accuracy of the learned model guides the search. As Section 3.6 explains, if the model is inaccurate the search may converge slowly.

The linear policy is inefficient and highly sensitive to the magnitude of peak rate. The benchmarking cost of Linear for Web server peaks at a higher absolute value for all samples than for DB_TP and Mail, causing more than a factor of

increase in the total cost for mapping the surface. Note that for Mail, Binsearch with Seeding incurs a slightly higher cost than Binsearch. For some configurations, as Section 3.7 explains, seeding can incur additional cost to recover from a bad seed resulting in longer search times.

**Figure 8:** The total cost for mapping response surfaces for three workloads using different policies.
$\begin{figure}\centering \epsfig{file=graphs/total_time.eps, width=8cm} \vspace{-5ex}\vspace{-2ex} \end{figure}$

Reducing the Number of Samples. To evaluate the RSM approach presented in Section 4, we approximate the response surface by a quadratic curve in two dimensions: peak rate = func(number of disks, number of nfsds). We use a D-optimal design [17] from RSM to obtain the best of

and

samples out of a total of

samples for learning the response surface equation. We use Binsearch to obtain the peak rate for each.

After learning the equation, we use it to predict the peak rate at all the other samples in the surface. Table 4 presents the mean absolute percentage error in predicting the peak rate across all the samples. The results show that D-optimal designs do a very good job of picking appropriate samples, and that very little more can be learned by small increases in the number of points sampled. Improving the accuracy of the surface with limited numbers of sampled points is an area of ongoing research.

Table 4: Mean Absolute Prediction Error (MAPE) in Predicting the Peak Rate

Workload	Num. of Samples	MAPE
DB_TP
Web server
Mail