Check out the new USENIX Web site. next up previous
Next: Buffer Management Up: Multi-Segment RETHER Previous: Connection Setup and Admission

  
Fault Tolerance

When the token on an Ethernet segment is lost or corrupted, the single-segment RETHER protocol's fault tolerance mechanism recovers from the fault by reintroducing the token in that segment. All the real-time connections that pass through the segment continue to work after token recovery. Therefore, multi-segment RETHER does not introduce any new problems compared to single-segment RETHER in this case. However, when network nodes crash, new mechanisms need to be devised to handle multi-segment connections in which the failed nodes participate.

For a multi-segment RETHER connection, either the crashed node is involved in the real-time connection or it is not. If the failed node is one of the intermediate switch or an end-point of a RETHER connection, the state associated with the connection needs to be cleaned up and the connection has to be reestablished, if possible. Connection re-establishment only makes sense when the failed node is one of the intermediate switches and there is an alternative route that can be used to bypass the failed switch. The cleanup of the state associated with a RETHER connection whose intermediate switch has crashed is triggered by the detection of this failure in all the segments to which the switch is connected. Because a RETHER switch participates in the token passing on all segments that are connected to it, the switch failure is detected independently on each of the segments via the fault tolerance scheme built into the single-segment RETHER . The nodes that detect the failure then broadcast a message to that effect on their respective segments. The other end points of the sub-connection to which the crashed node was connected, upon receiving such a message, frees up associated resources, and sends an abort message to the next sub-connection. The message eventually reaches the actual end-points of the connection in either direction and all the reserved resources for the connection are released. Just like the termination message due to failure of admission, a message travels along the path of the connection on both sides of the failed node. If the failed node is an end point of a real-time connection, the processing is similar except the clean-up message for each affected real-time connection only propagates in one direction.


  
Figure: Failure of an intermediate switch in a multi-segment connection. The failure is detected independently on all the segments the switch is connected to, in this case, Segment 1 and 2.
\begin{figure}\centerline{\psfig{figure=/home/users1/chitra/rether/thesis/figs/multiFault.ps,width=3.5in}}
\end{figure}

For instance, in Figure [*], suppose Gw1 were to crash and Node A2 detects the failure on Segment 1 and Node B2 detects it on Segment 2. A2 and B2 inform all the node on their respective segments by broadcasting a message. On receiving the message, A1 terminates its sub-connection and frees up the connection's associated resources. Gw2 similarly terminates the sub-connection whose other end-point was Gw1, on Segment 2 Since this is a multi-segment connection, Gw2 also sends a message to C1 to terminate the entire connection and to free the reserved resources. Thus, all the real-time connections that have Gw1 on their path are terminated with the associated state across the network cleaned up.

If, on the other hand, the failed node is not involved in the real-time connection, then the connection continues as before. For example, if Node A3 in Figure [*] dies, the real-time connection from A1 to C1 remains operative after the token recovery. The failure is detected and the token is recovered locally in the segment to which the failed node is connected. The effect on any real-time sessions crossing this segment is that they would not be able to send/receive data during the fault recovery period.


next up previous
Next: Buffer Management Up: Multi-Segment RETHER Previous: Connection Setup and Admission
Tzi-cker Chiueh
1999-03-18