2.4 Restoring data

Next: 2.5 Handling downtime Up: 2 The simplified scheme Previous: 2.3 Backing up data

2.4 Restoring data

Restoration can be done from any computer in the event of the backed-up machine's total destruction. The backed-up computer's logical disk can be recovered to the new computer's local disk given a list of the original computer's current partners by using the following procedure: Contact each partner and ask for all of the backed-up computer's data. For each block stripe, attempt to decode using the erasure-correcting code the blocks with valid checksums and the highest version number in that stripe. If you succeed, write the resulting data blocks to local disk in the appropriate places. Keep repeating this process, retrying partners that were down, until additional blocks cannot result in more stripes being successfully decoded or time runs out.

It is the responsibility of the backed-up-computer maintainer to keep one or more copies of the list of current partners off-site in a security box or the like. This list is generated shortly after joining once the initial set of partners has been determined and updated occasionally as partners change.

To limit how often this list must be updated, we store the list of current partners in a special block (the master block) that is replicated on each partner and not part of any block stripe. This means that the list can be retrieved from any current partner so that the off-site list actually needs to be updated only every partner changes under the assumption that we must tolerate partners failing. If this is still too frequent, it is possible to add many additional partners that we only swap master blocks with.

Next: 2.5 Handling downtime Up: 2 The simplified scheme Previous: 2.3 Backing up data

Mark Lillibridge 2003-04-07