Check out the new USENIX Web site. next up previous
Next: Name-space containment Up: Replica set management Previous: File creation

Replica addition

The protocol for creating additional replicas for a file is run when a user tries to access a file not present in her local node. Say that a user on node $S$ wants to read file $F$. A read or write request is always preceded by a directory lookup (during the open request) on $S$. Thus, to create a replica, $S$ must replicate the file's parent directory. This recursive step may continue all the way up to the root directory. The locations of root replicas are maintained by the membership service (Section 3.2).

Pangaea performs a short-cut replica creation to transfer data from a nearby existing replica. To create a replica of $F$, $S$ first discovers the file's gold replicas in the directory entry during the path-name lookup. $S$ then requests the file contents from the gold replica closest to $S$ (say $P$). $P$ then finds a replica closest to $S$ among its own graph neighbors (say $X$, which may be $P$ itself) and forwards the request to $X$, which in turn sends the contents to $S$. At this point, $S$ replies to the user and lets her start accessing the replica. This request forwarding is performed because the directory only knows $F$'s gold replicas, and there may be a bronze replica closer to $P$ than the gold ones.

The new copy must be integrated into the file's replica graph to be able to propagate updates to and receive updates from other replicas. Thus, in the background, $S$ chooses $m$ existing replicas of $F$, adds edges to them, and requests them to add edges to the new replica in $S$. The selection of $m$ peers must satisfy three goals:

Pangaea satisfies all these goals simultaneously, as a replica can have multiple edges. $S$ chooses three types of peers for the new replica. First, $S$ adds an edge to a random gold replica, preferably one from a different region than $S$, to give that gold replica more variety of regions in its neighbor set. Second, it asks a random gold replica, say $P$, to pick the replica (among $P$'s immediate graph neighbors) closest to $S$. Third, $S$ asks $P$ to choose $m-2$ random replicas using random walks that start from $P$ and perform a series of RPC calls along graph edges. This protocol ensures that the resulting graph is $m$ edge- and node- connected, provided that it was $m$-connected before.

Parameter $m$ trades off availability and performance. A small value increases the probability of graph disconnection (i.e., the probability that a replica cannot exchange updates with other replicas) after node failures. A large value for $m$ increases the overhead of graph maintenance and update propagation by causing duplicate update delivery. We found that $m=4$ offers a good balance in our prototype.

next up previous
Next: Name-space containment Up: Replica set management Previous: File creation
Yasushi Saito 2002-10-08