Check out the new USENIX Web site. next up previous
Next: Unthrottled mode Up: The causality engine Previous: The causality engine


Throttled mode

Figure 5: ThrottledMode
\begin{figure}\begin{center}
\epsfig{file=fig/algorithm.eps, scale=1.0}
\end{cen...
...e. Finally, the I/O is issued and the completion time is recorded.}
\end{figure}

When a node is being throttled, up to three pieces of information are added to the trace for each I/O. First, the compute time since the last I/O is determined (using Approach 1 or 2) and a COMPUTE(<seconds>) call is added to the trace. Second, the I/O operation and its arguments are added. Third, signaling information is added, as per the I/O sampling period.

The I/O sampling period determines how frequently the causality engine delays I/O to check for dependencies (e.g., a period of 1 indicates that every I/O is delayed) and therefore determines how many data dependencies are discovered. In general, if the sampling period is $p$, the causality engine will discover dependencies within $p$ operations of the true dependency. Because the sampling period determines the rate of throttling, too large a sampling period can also affect the computation calculation. In these cases, Approach 2 (Section 3.2.2) is preferred.

When an I/O is being delayed, the causality engine delays issuing the I/O until all unthrottled nodes either exit or block (i.e., a dependency has been found). A remote procedure call is sent from the causality engine of the throttled node to a watchdog process on each unthrottled node to make this determination; some nodes may have exited, others may be blocked. If a node has exited, then it is not dependent on the delayed I/O. Otherwise, the throttled node adds a SIGNAL(<unthrottled node id>) to its trace, and the unthrottled node adds a corresponding WAIT(<throttled node id>) call to its trace. After the throttled node has received a reply from all of the watchdogs (one per unthrottled node), the I/O is issued. Algorithm [*] shows the pseudocode.

Of course, delaying I/O in this manner can produce indirect dependencies. For example, referring back to Figure 3, a sampling period of 1 will indicate that the open() call for node 1 is dependent on each I/O from node 0; namely, the open(), the two write() calls, and the close() -- and the traces will be annotated as such to reflect this. However, the only signal needed is that following the close() operation, and the redundant SIGNAL() and WAIT() calls can be easily removed as a preprocessing step to trace replay. The indirect dependencies that cannot be removed are those due to transitive relationships. For example, if node 2 is dependent on node 1, and node 1 on node 0, the causality engine will detect the indirect dependency between nodes 0 and 2. Although these transitive dependencies add additional SIGNAL() and WAIT() calls to the traces, they never force a node to block unnecessarily.

As to selecting the proper sampling period, this depends on the application and storage system. Some workloads and storage systems may be more sensitive to changes in inter-node synchronization than others, so no one sampling period should be expected to work best for all. An iterative approach for determining the proper sampling period is presented in Section 5.


next up previous
Next: Unthrottled mode Up: The causality engine Previous: The causality engine
Michael Mesnier 2006-12-22