Skip to main content
Back to USENIX
  • Conferences
  • Students
Sign in

USENIX Conference Policies

  • Event Code of Conduct
  • Conference Network Policy
  • Statement on Environmental Responsibility Policy

Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval

Content-based full-text search still remains a particularly challenging problem in peer-to-peer (P2P) systems. Traditionally, there have been two index partitioning structures—partitioning based on the document space or partitioning based on keywords. The former requires search of every node in the system to answer a query whereas the latter transmits a large amount of data when processing multi-term queries. In this paper, we propose eSearch—a P2P keyword search system based on a novel hybrid indexing structure. In eSearch, each node is responsible for certain terms. Given a document, eSearch uses a modern information retrieval algorithm to select a small number of top (important) terms in the document and publishes the complete term list for the document to nodes responsible for those top terms. This selective replication of term lists allows a multi-term query to proceed local to the nodes responsible for query terms. We also propose automatic query expansion to alleviate the degradation of quality of search results due to the selective replication, overlay source multicast to reduce the cost of disseminating term lists, and techniques to balance term list distribution across nodes.

eSearch is scalable and efficient, and obtains search results as good as state-of-the-art centralized systems. Despite the use of replication, eSearch actually consumes less bandwidth than systems based on keyword partitioning when publishing metadata for a document. During a retrieval operation, it searches only a small number of nodes and typically transmits a small amount of data (3.3KB) that is independent of the size of the corpus and grows slowly (logarithmically) with the number of nodes in the system. eSearch's efficiency comes at a modest storage cost, 6.8 times that of systems based on keyword partitioning. This cost can be further reduced by adopting index compression or pruning techniques.

Chunqiang Tang, University of Rochester

Sandhya Dwarkadas, University of Rochester

BibTeX
@inproceedings {270026,
author = {Chunqiang Tang and Sandhya Dwarkadas},
title = {Hybrid {Global-Local} Indexing for Efficient {Peer-to-Peer} Information Retrieval},
booktitle = {First Symposium on Networked Systems Design and Implementation (NSDI 04)},
year = {2004},
address = {San Francisco, CA},
url = {https://www.usenix.org/conference/nsdi-04/hybrid-global-local-indexing-efficient-peer-peer-information-retrieval},
publisher = {USENIX Association},
month = mar
}
Download

Links

Paper: 
http://usenix.org/publications/library/proceedings/nsdi04/tech/full_papers/tang/tang.pdf
Paper (HTML): 
http://usenix.org/publications/library/proceedings/nsdi04/tech/full_papers/tang/tang_html/index.html
  • Log in or register to post comments

© USENIX
EIN 13-3055038

  • Privacy Policy
  • Contact Us