Replica addition

Next: Name-space containment Up: Replica set management Previous: File creation

Replica addition

The protocol for creating additional replicas for a file is run when a user tries to access a file not present in her local node. Say that a user on node wants to read file . A read or write request is always preceded by a directory lookup (during the open request) on . Thus, to create a replica, must replicate the file's parent directory. This recursive step may continue all the way up to the root directory. The locations of root replicas are maintained by the membership service (Section 3.2).

Pangaea performs a short-cut replica creation to transfer data from a nearby existing replica. To create a replica of , first discovers the file's gold replicas in the directory entry during the path-name lookup. then requests the file contents from the gold replica closest to (say ). then finds a replica closest to among its own graph neighbors (say , which may be itself) and forwards the request to , which in turn sends the contents to . At this point, replies to the user and lets her start accessing the replica. This request forwarding is performed because the directory only knows 's gold replicas, and there may be a bronze replica closer to than the gold ones.

The new copy must be integrated into the file's replica graph to be able to propagate updates to and receive updates from other replicas. Thus, in the background, chooses existing replicas of , adds edges to them, and requests them to add edges to the new replica in . The selection of peers must satisfy three goals:

Include gold replicas so that they have more choices during future short-cut replica creation.
Include nearby replicas so that updates can flow through fast network links.
Be sufficiently randomized so that, with high probability, the crash of nodes does not catastrophically disconnect the file's graph.

Pangaea satisfies all these goals simultaneously, as a replica can have multiple edges. chooses three types of peers for the new replica. First, adds an edge to a random gold replica, preferably one from a different region than , to give that gold replica more variety of regions in its neighbor set. Second, it asks a random gold replica, say , to pick the replica (among 's immediate graph neighbors) closest to . Third, asks to choose random replicas using random walks that start from and perform a series of RPC calls along graph edges. This protocol ensures that the resulting graph is edge- and node- connected, provided that it was -connected before.

Parameter trades off availability and performance. A small value increases the probability of graph disconnection (i.e., the probability that a replica cannot exchange updates with other replicas) after node failures. A large value for increases the overhead of graph maintenance and update propagation by causing duplicate update delivery. We found that offers a good balance in our prototype.

Next: Name-space containment Up: Replica set management Previous: File creation

Yasushi Saito 2002-10-08