USENIX 2003 Annual Technical Conference, General Track Abstract
Pp. 113-126 of the Proceedings
Application-specific Delta-encoding via Resemblance Detection
Fred Douglis and Arun Iyengar IBM T.J. Watson Research Center
Many objects, such as files, electronic messages, and web pages, contain overlapping content. Numerous past research projects have observed that one can compress one object relative to another one by computing the differences between the two, but these delta-encoding systems have almost invariably required knowledge of a specific relationship between them--most commonly, two versions using the same name at different points in time. We consider cases in which this relationship is determined dynamically, by efficiently determining when a sufficient resemblance exists between two objects in a relatively large collection. We look at specific examples of this technique, namely web pages, email, and files in a file system, and evaluate the potential data reduction and the factors that influence this reduction. We find that delta-encoding using this resemblance detection technique can improve on simple compression by up to a factor of two, depending on workload, and that a small fraction of objects can potentially account for a large portion of these savings.
- View the full text of this paper in HTML and
Until June 2004, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2003 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.