Check out the new USENIX Web site. next up previous
Next: Disaster Test Up: Evaluation Previous: Evaluation Metrics


Reliability During Disaster

Figure 6: Data loss as a result of disaster and wide-area link failure, varying link loss (50ms one-way latency and FEC params $ (r,c)=(8,3)$ ).
\includegraphics[width=0.9\columnwidth]{results/graph4/disaster-50.eps}

We measure reliability in two ways:

These questions are highly related; we distinguish between them as follows: The maximum amount by which the primary and mirror sites can diverge is the extent of the bandwidth-delay product of the link between them; however, the amount of data lost in the event of failure depends on how much of this data has been acknowledged to the application. In other words, how often can we be caught in a lie? For instance, with a remote-sync solution (synchronous mirroring), though bandwidth-delay product - and hence primary-to-mirror divergence - may be high, data loss is zero. This, of course, is at severe cost to performance. With a local-sync solution (async- or semi-synchronous mirroring), on the other hand, data loss is equal to divergence. The following experiments show that the network-sync solution with SMFS achieves a desirable mean between these two extremes.

Subsections
next up previous
Next: Disaster Test Up: Evaluation Previous: Evaluation Metrics
Hakim Weatherspoon 2009-01-14