首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning
  • 本地全文:下载
  • 作者:Kumar, Mukesh ; Vig, Renu
  • 期刊名称:Journal of Emerging Technologies in Web Intelligence
  • 印刷版ISSN:1798-0461
  • 出版年度:2013
  • 卷号:5
  • 期号:1
  • 页码:70-77
  • DOI:10.4304/jetwi.5.1.70-77
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:A focused crawler traverses the Web to collect documents related to a particular topic, and can be used to build topic specific collection of documents for use in digital libraries and domain specific search. General crawlers make use of breath first search method to traverse the Web for as much amount of information as possible. Focused crawler help the search indexer to index all documents present on the World Wide Web related to a specific domain which in turn provides search engine’s users complete and fresher most information. In this paper we present a focused crawler capable of learning from the previous crawl results to collect the documents related to the sports domain. Crawling results for four consecutive crawls are shown. Results shows significant improvement in the precision value for the crawler with respect to the number of crawling attempts made.
  • 关键词:Web;Internet;Retrieval;Focused Web Crawler;Search Engine.
国家哲学社会科学文献中心版权所有