首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Realizing Peer-to-Peer and Distributed Web Crawler
  • 本地全文:下载
  • 作者:Anup A Garje ; Bhavesh Patel ; B. B. Meshram
  • 期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
  • 印刷版ISSN:2278-1323
  • 出版年度:2012
  • 卷号:1
  • 期号:4
  • 页码:353-357
  • 出版社:Shri Pannalal Research Institute of Technolgy
  • 摘要:The tremendous growth of the World Wide Web has made tools such as search engines and information retrieval systems have become essential. In this dissertation, we propose a fully distributed, peer-to-peer architecture for web crawling. The main goal behind the development of such a system is to provide an alternative but efficient, easily implementable and a decentralized system for crawling, indexing, caching and querying web pages. The main function of a webcrawler is to recursively visit web pages, extract all URLs form the page, parse the page for keywords and visit the extracted URLs recursively. We propose an architecture that can be easily implemeneted on a local (campus) network and which follows a fully distributed, peer-to-peer architecture. The architecture specifications, implementation details, requirements to be met and analysis of such a system is discussed.
  • 关键词:Peer to peer; Distributed;Crawling;indexing
国家哲学社会科学文献中心版权所有