Bloom Cast: Efficient and Effective Full-Text Retrieval in Unstructured P2P Networks

Efficient and effective full-text retrieval in unstructured peer-to-peer networks remains a challenge in the research community. First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs. In this paper, we propose BloomCast, an efficient and effective full-text retrieval scheme, in unstructured P2P networks.

By leveraging a hybrid P2P protocol, BloomCast replicates the items uniformly at random across the P2P networks, achieving a guaranteed recall at a communication cost of , where N is the size of the network. Furthermore, by casting Bloom Filters instead of the raw documents across the network, BloomCast significantly reduces the communication and storage costs for replication. We demonstrate the power of BloomCast design through both mathematical proof and comprehensive simulations based on the query logs from a major commercial search engine and NIST TREC WT10G data collection. Results show that BloomCast achieves an average query recall of 91 percent, which outperforms the existing WP algorithm by 18 percent, while BloomCast greatly reduces the search latency for query processing by 57 percent.

Existing System:

In the existing system there are two major issues . First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs. An existing p2p search schemes: DHT-based global index and federated search engine over unstructured protocols. DHT-based search engines are based on distributed indexes that partition a logically global inverted index in a physically distributed manner.

Federated search engine over unstructured p2ps, queries are processed based on flooding. Unstructured p2ps are commonly believed to be the best candidate for supporting full-text retrieval because the query evaluation operations an be handled at the nodes that store the relevant documents. Replication strategies are extensively utilized to improve search performance in unstructured p2ps. The first type is the query popularity aware strategies. The second type of replication strategy is independent of the popularity of the query, such as the WP scheme.

Proposed System:

In the proposed system, we propose a novel strategy, called BloomCast , an efficient and effective full-text retrieval scheme, in unstructured P2P networks. The query popularity independent replication strategy, we propose a novel strategy, called Bloom Cast, to support efficient and effective full-text retrieval. Bloom Cast are mathematically that the recall can be guaranteed at a communication cost of O (square root N), where N is the size of the network.

Modules:

  • Node creation
  • Bloom cast replication model generationn
  • Bloom cast
  • Bloom filter
  • Query recall

Tools Used:

Front End : Java