Check out the new USENIX Web site.
USENIX, The Advanced Computing Systems Association

4th USENIX Symposium on Networked Systems Design & Implementation

Pp. 15–28 of the Proceedings

Exploiting Similarity for Multi-Source Downloads Using File Handprints

Himabindu Pucha, Purdue University; David G. Andersen, Carnegie Mellon University; Michael Kaminsky, Intel Research Pittsburgh

Abstract

Many contemporary approaches for speeding up large file transfers attempt to download chunks of a data object from multiple sources. Systems such as BitTorrent quickly locate sources that have an exact copy of the desired object, but they are unable to use sources that serve similar but non-identical objects. Other systems automatically exploit cross-file similarity by identifying sources for each chunk of the object. These systems, however, require a number of lookups proportional to the number of chunks in the object and a mapping for each unique chunk in every identical and similar object to its corresponding sources. Thus, the lookups and mappings in such a system can be quite large, limiting its scalability.

This paper presents a hybrid system that provides the best of both approaches, locating identical and similar sources for data objects using a constant number of lookups and inserting a constant number of mappings per object. We first demonstrate through extensive data analysis that similarity does exist among objects of popular file types, and that making use of it can sometimes substantially improve download times. Next, we describe handprinting, a technique that allows clients to locate similar sources using a constant number of lookups and mappings. Finally, we describe the design, implementation and evaluation of Similarity-Enhanced Transfer (SET), a system that uses this technique to download objects. Our experimental evaluation shows that by using sources of similar objects, SET is able to significantly out-perform an equivalently configured BitTorrent.

  • View the full text of this paper in HTML and PDF. Listen to the presentation in MP3 format.
    Click here if you have forgotten your password Until April 2008, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2007 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
Join Now! To become a USENIX member, please see our Membership Information.
Last changed: 29 May 2007 ac