USENIX Conference Policies
Application-Specific Delta-Encoding via Resemblance Detection
Many objects, such as files, electronic messages, and web pages, contain overlapping content. Numerous past research projects have observed that one can compress one object relative to another one by computing the differences between the two, but these delta-encoding systems have almost invariably required knowledge of a specific relationship between them--most commonly, two versions using the same name at different points in time. We consider cases in which this relationship is determined dynamically, by efficiently determining when a sufficient resemblance exists between two objects in a relatively large collection. We look at specific examples of this technique, namely web pages, email, and files in a file system, and evaluate the potential data reduction and the factors that influence this reduction. We find that delta-encoding using this resemblance detection technique can improve on simple compression by up to a factor of two, depending on workload, and that a small fraction of objects can potentially account for a large portion of these savings.
author = {Fred Douglis and Arun Iyengar},
title = {{Application-Specific} {Delta-Encoding} via Resemblance Detection},
booktitle = {2003 USENIX Annual Technical Conference (USENIX ATC 03)},
year = {2003},
address = {San Antonio, TX},
url = {https://www.usenix.org/conference/2003-usenix-annual-technical-conference/application-specific-delta-encoding-resemblance},
publisher = {USENIX Association},
month = jun
}