Check out the new USENIX Web site. next up previous
Next: Performance Up: Design Previous: Encoding Dependencies

Cycle Breaking

Any cycles found in the dependency graph represent a set of COPY commands that mutually depend on each others' completion. The only ways to find a valid ordering of the COPY commands in a cycle are to remove an edge of the cycle or remove a node (COPY command) entirely.

The cycle breaking method described in the previous section deletes an entire node to resolve a cyclic dependency. To compensate, the sender sends an ADD command with the data that should have been copied by the original command. This deletion costs $k$ bytes of compression. The sender marks the node as deleted and the process continues. In-place delta compression [2] uses another metric that finds and deletes the smallest COPY command in a cycle in order to minimize compression loss. Ip-rsync need not distinguish between nodes based on size. Rsync has a fixed block size; all COPY commands have the same length.

We implement an alternative method for breaking cycles. This ``trimming'' method removes an edge of the cycle, rather than a node. An edge occurs when the read region of one COPY overlaps the write region of another COPY. The edge is eliminated by shrinking one COPY command, so that it no longer overlaps with the other COPY. An ADD command is generated that covers the trimmed region. We call this trimming a node, because it reduces the read and write regions described by the node. Trimming preserves some of the benefit of existing COPY commands when breaking cycles. When ip-rsync with trimming detects a cycle, it scans through all nodes in that cycle. The scan examines the overlap between each pair of nodes. It trims the dependency with the least overlap. The goal of this policy is to minimize compression loss. After trimming a node, ip-rsync checks existing edges pointing to the trimmed node to ensure that dependencies have not changed. It is possible that the trimmed node no longer conflicts (overlaps) with other nodes in its edge list.

To preserve the format and encoding of COPY commands, the algorithm does not change a COPY command when trimming the corresponding graph node. Instead, the algorithm allows the COPY command to write corrupt data and repairs the corrupt bytes with an ADD command. This preserves the fixed-size block used by rsync. During the transmission phase, no changes are made to the protocol and the target copies the full block for a trimmed node, which includes corrupted bytes in the overlapping region. The ADD command generated when trimming the node repairs the corrupted data.


next up previous
Next: Performance Up: Design Previous: Encoding Dependencies
2003-04-08