Experiment 1 (think-limited)

Think-limited replays the trace files against the storage devices with a fixed amount of think time between I/Os. The I/O traces are collected through the causality engine running in a special mode: no I/O is delayed and the COMPUTE() calls also include any synchronization time.

Figure 9 shows the replay error of think-limited. The best result is for Pseudo, which performs little synchronization (a single barrier between the write phase and read phase). The replay errors on the VendorA, VendorB, and VendorC, storage systems are, respectively, 19%, 4%, and 7% (i.e., the trace replay time is within 19% of the application running time across all storage systems). Unfortunately, it is only for applications such as these (i.e., few data dependencies) that think-limited does well.

Looking now at PseudoSync, one can see the effects of synchronization. All nodes write their checkpoints in lockstep, performing a barrier synchronization after every write I/O. The errors are 82%, 23%, and 31%, indicating that synchronization, when assumed to be fixed, can lead to significant replay error when traces collected from one storage system are replayed on another.

In PseudoSyncDat, nodes synchronize between I/Os and also perform computation. The errors are 33%, 21%, and 15%. In this case, adding computation makes the replay time less dependent on synchronization.

**Figure 9:** Think-limited error (Experiment 1)
$\begin{figure}\begin{center} \epsfig{file=fig/fig_afap_prime_worst_case_replay_e... ...t}, \texttt{Fitness}, and \texttt{Quake}) experience more error. } \end{figure}$

Fitness is a partitioned, read-only workload. Each node sequentially reads a 1 GB region of the disk, with no overlap among the nodes. The nodes proceed sequentially: node 0 reads its entire region first and then signals node 1, then node 1 reads its region and signals node 2, etc. Ignoring these data dependencies during replay will result in concurrent access from each node, which in this case increases performance on each storage system.³ The replay errors are 166%, 205%, and 40%.

Quake represents a complex application with multiple I/O phases, each with a different mix of compute and synchronization. The think-limited replay errors for Quake are 21%, 26%, and 25%. As with the other applications tested, these errors in running time translate to larger errors in terms of bandwidth and throughput. For example, in the case of Quake, think-limited transfers the same data in 79%, 74%, and 75% of the time, resulting in bandwidth and throughput differences of 27%, 35%, and 33%, respectively. This places unrealistic demands on the storage system under evaluation.