文章基本信息

标题：Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning
本地全文：下载
作者：Kumar, Mukesh ; Vig, Renu
期刊名称：Journal of Emerging Technologies in Web Intelligence
印刷版ISSN：1798-0461
出版年度：2013
卷号：5
期号：1
页码：70-77
DOI：10.4304/jetwi.5.1.70-77
语种：English
出版社：Academy Publisher
摘要：A focused crawler traverses the Web to collect documents related to a particular topic, and can be used to build topic specific collection of documents for use in digital libraries and domain specific search. General crawlers make use of breath first search method to traverse the Web for as much amount of information as possible. Focused crawler help the search indexer to index all documents present on the World Wide Web related to a specific domain which in turn provides search engine’s users complete and fresher most information. In this paper we present a focused crawler capable of learning from the previous crawl results to collect the documents related to the sports domain. Crawling results for four consecutive crawls are shown. Results shows significant improvement in the precision value for the crawler with respect to the number of crawling attempts made.
关键词：Web;Internet;Retrieval;Focused Web Crawler;Search Engine.