Next: Design Up: In-Place Rsync: File Synchronization Previous: Introduction

Background

Ip-rsync builds on the rsync algorithm for propagating changes between two files located at different sites [18,19]. Rsync is widely used for backup and restore, as well as file transfer. Rsync finds blocks of data that occur in both the target file and the source file. It saves bandwidth and transfer time when updating the target file by not sending blocks from the source file that exist already in the target file. Rsync operates on a fixed block size in a single pass, or round, over the files. A multi-round version of rsync (mrsync) increases compression by taking multiple passes over files, halving the block size in each subsequent round [9]. Multi-round rsync detects common sections of the files at a fine granularity. However, in multi-round rsync, the sender (source) does not send unmatched data until all rounds are complete. In this way, multi-round rsync loses much of the benefit of of the interleaved transfer found in rsync. Rsync transmits unmatched data while searching within the file for matching data. Rsync outperforms mrsync when the similarities and differences between files consist of large sequences. Mrsync performs well when files consist of short matching sequences separated by small differences. Mrsync is more suitable in lower-bandwidth networks in which transfer time dominates. Although our in-place algorithm is suitable for multi-round rsync, we did not implement an ip-mrsync because of mrsync's limited adoption.

The concepts of rsync influence the design of many distributed systems. In particular, the combination of a weak and strong checksum has been used in a low-bandwidth file system [12], migrating virtual computers [15], and a transactional object store [16].

Rsync has much in common with delta compression. Both encode changes between files using COPY and ADD commands. Delta compression differs in its semantics because it compares two files that are collocated, rather than two files separated by a network. Algorithms for delta compression are based on hashing [1,10,13] or extensions to Lempel-Ziv compression [4,6,8]. The problem of compactly representing versions as a small set of changes was introduced by Tichy as the string-to-string correction problem with block move [17].

In-place reconstruction has been previously addressed for delta compression [2,3]. The problem and solution are similar to ip-rsync, because they both reorder the execution of commands to avoid conflicting COPY commands.

We feel that in-place reconstruction is more widely applicable in rsync than in delta compression. In-place delta compression can transmit data to a resource limited device. However, it precludes a resource limited sender because it requires both versions of a file to generate a delta encoding. Rsync allows versions to be synchronized between two resource-constrained devices and, therefore, may be used in peer-to-peer and serverless applications.

Next: Design Up: In-Place Rsync: File Synchronization Previous: Introduction

2003-04-08