文章基本信息

标题：Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler
本地全文：下载
作者：A.K. Sharma, Ashutosh Dixit
期刊名称：International Journal of Computer Science and Network Security
印刷版ISSN：1738-7906
出版年度：2008
卷号：8
期号：12
页码：349-354
出版社：International Journal of Computer Science and Network Security
摘要：
Due to the deficiency in their refresh techniques [12], current crawlers add unnecessary traffic to the already overloaded Internet. Moreover there exist no certain ways to verify whether a document has been updated or not. In this paper, an efficient approach is being proposed for building an effective incremental web crawler [13]. It selectively updates its database and/ or local collection of web pages instead of periodically refreshing the collection in batch mode thereby improving the “freshness” of the collection significantly and bringing new pages in more timely manner. It also detects web pages which frequently undergo up-dation and dynamically calculates the refresh time of the page for its next update.
关键词：
World Wide Web, Search engine, Incremental Crawler, Hypertext, Browser