首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:Design of an Agent Based Context Driven Focused Crawler
  • 本地全文:下载
  • 作者:Naresh Chauhan ; A.K. Sharma
  • 期刊名称:BVICAM's International Journal of Information Technology
  • 印刷版ISSN:0973-5658
  • 出版年度:2009
  • 卷号:1
  • 期号:1
  • 出版社:Bharati Vidyapeeth's Institute of Computer Applications and Management
  • 摘要:A focused crawler downloads web pages that are relevant to a user specified topic. Most of the existing focused crawlers are keyword driven and do not take into account the context associated with the keywords. This leads to retrieval of a large number of web pages irrespective of the fact whether they are logically related. Thus, the keyword based strategy alone is not sufficient for the design of a focused crawler as context relevance is more important as far as the user’s requirement is concerned. This paper proposes the design of a context driven focused crawler (CDFC) that searches and downloads only highly related web pages, thereby reducing the network traffic. It also employs a category tree which is a flexible user interface showing the broad categories of the topics on the web. Since CDFC downloads only the relevant and credible documents, a very small number in comparison, the proposed design significantly reduces the storage space at the search engine side.
  • 关键词:Search engine; Crawler; Hypertext Document System;Category Tree; Software Agentso
国家哲学社会科学文献中心版权所有