We now relate the peak rate algorithm that Section 3 describes to the larger challenge of mapping a peak rate response surface efficiently and effectively, based on Algorithm 1.
A large number of factors can affect performance,
so it is important to
sample the multi-dimensional space with care as well as to optimize
the inner loop. For
example, suppose we are mapping the impact of five factors on a file server's
peak rate, and that we sample five values for each factor. If the benchmarking
process takes an hour to find the peak rate for each factor combination, then
the total time for benchmarking is
days. An automated workbench
controller can shorten this time by pruning the sample space, planning
experiments to run on multiple hardware setups in parallel, and optimizing the
inner loop.
We consider two specific challenges for mapping a response surface:
If the benchmarking objective is to understand the overall trend of
how the peak rate is affected by certain factors of interest
--rather than finding accurate peak rate
values for each sample in
--then
Algorithm 1 can leverage
Response Surface Methodology
(RSM) [17] to
select the sample points efficiently (in Step 2).
RSM is a branch of statistics that provides principled techniques to choose a
set of samples to obtain good approximations of the
overall response surface at low cost.
For example, some RSM techniques
assume that a low-degree multivariate polynomial model-- e.g., a quadratic
equation of the form
-- approximates the surface in the
-dimensional
space. This approximation is a basis
for selecting a minimal
set of samples for the controller to obtain in order to learn a
fairly accurate model (i.e., estimate values of the
parameters in the
model). We evaluate one such RSM technique in Section 5.
It is important to note that these RSM techniques may reduce the effectiveness of the seeding heuristics described in Section 3.7. RSM techniques try to find sample points on the surface that will add the most information to the model. Intuitively, such samples are the ones that we have the least prior information about, and hence for which seeding from prior results would be least effective. We leave it to future work to explore the interactions of the heuristics for selecting samples efficiently and seeding the peak rate search for each sample.
varun 2008-05-13