Check out the new USENIX Web site. Check out the new USENIX Web site.

USENIX Home . About USENIX . Events . membership . Publications . Students
USENIX 2004 Annual Technical Conference, General Track — Abstract

Pp. 73–86 of the Proceedings

Alternatives for Detecting Redundancy in Storage Systems Data

Calicrates Policroniades and Ian Pratt, Cambridge University


Storage systems frequently maintain identical copies of data. Identifying such data can assist in the design of solutions in which data storage, transmission, and management are optimised. In this paper we evaluate three methods used to discover identical portions of data: whole file content hashing, fixed size blocking, and a chunking strategy that uses Rabin fingerprints to delimit content-defined data chunks. We assess how effective each of these strategies is in finding identical sections of data. In our experiments, we analysed diverse data sets from a variety of different types of storage systems including a mirrored section of, different data profiles in the file system infrastructure of the Cambridge University Computer Laboratory, source code distribution trees, compressed data, and packed files. We report our experimental results and present a comparative analysis of these techniques. This study also shows how levels of similarity differ between data sets and file types. Finally, we discuss the advantages and disadvantages in the application of these methods in the light of our experimental results.
  • View the full text of this paper in HTML and PDF.
    The Proceedings are published as a collective work, © 2004 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.

  • If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
To become a USENIX Member, please see our Membership Information.

?Need help? Use our Contacts page.

Last changed: 25 June 2004 ch
Technical Program
USENIX '04 Home