Check out the new USENIX Web site. next up previous
Next: Background Up: In-Place Rsync: File Synchronization Previous: In-Place Rsync: File Synchronization

Introduction

Rsync [18,19] makes efficient file synchronization a reality. It enables administrators to propagate changes to files or directory trees. To save bandwidth and time, rsync moves a minimum amount of data by identifying common regions between a source and target file. When synchronizing files, rsync sends only the portions of the file that have changed and copies unchanged data from the previous version already on the target. Faster and more efficient methods for synchronizing copies make it easier to manage distributed replicas.

Despite rsync's efficiency, its shortcomings sometimes preclude its use. We address one specific shortcoming. Each time rsync synchronizes a file, it reserves temporary space in which it constructs the new file version. Rsync maintains two copies (one new, one old) on the target for the duration of the transfer. Rsync cannot be used without sufficient temporary space for two copies of a file.

The construction of the new target file in temporary space often renders rsync unusable on mobile devices with limited memory. A popular device by Palm contains only 16MB of memory. For the Palm to keep enough temporary space available might require up to 8MB free (Figure 1). Insufficient space often excludes the Palm from performing traditional rsync and forces a transfer of the entire file. Ironically, handheld systems, compact and convenient machines that can benefit from an efficient propagation of updates, cannot always afford the space overhead of rsync.

Rsync cannot backup or replicate block devices. Although the benefits of compression make rsync well-suited to the task, systems rarely have spare block devices on which to put temporary data.

Figure 1: With 16MB of memory: (a) There is enough space for the file $F_t$ (3MB) and a temporary copy $F_t'$. (b) There is insufficient space for a temporary copy $F_t'$ of $F_t$ (9MB) and rsync cannot be performed.
\begin{figure*}\begin{center}
\input{notemp.pstex_t}
\end{center}\par\vspace{-10pt}
\par\end{figure*}

We have modified rsync so that it performs file synchronization tasks with in-place reconstruction. We call this in-place rsync or ip-rsync. Instead of using temporary space, the changes to the target file take place in the space already occupied by the current version. This tool can be used to synchronize devices where space is limited.

In-place reconstruction eliminates the need for additional storage by using the space already occupied by the file [2,3]. In-place reconstruction seems trivial, but the process must account for hazards that arise when moving a block of data from its original location in the old file to its location in the new file - an operation called a COPY command. Not only does each COPY read a block of the file, but it also overwrites $k$ bytes. Overwritten regions cannot be used in future COPY commands because they no longer contain the original data.

The goals of the ip-rsync algorithm include: (1) prevent the copying of corrupted data, which has been previously written by another COPY operation; and, (2) minimize compression loss. To prevent the copying of corrupted data, ip-rsync identifies COPY commands that write into regions from which other COPY commands read and then performs the read operation (on the original data) before executing the write. It is not always possible to reorder COPY commands to avoid all conflicts. In this case, ip-rsync discards the conflicting COPY operation. The data corresponding to the COPY are sent from the host to the target. Sending the additional data, instead of copying from the file already on the target, reduces compression and increases the time needed to synchronize files. Ip-rsync implements several heuristics for selecting COPY commands to eliminate that minimize compression loss.

We describe the in-place rsync utility as an extension to rsync. We start with an overview of the rsync algorithm and a discussion of its performance optimizations. We follow with our algorithm for performing rsync in-place and a discussion of the effect of in-place reconstruction on the algorithm's optimizations.


next up previous
Next: Background Up: In-Place Rsync: File Synchronization Previous: In-Place Rsync: File Synchronization
2003-04-08