期刊名称:International Journal of Grid and Distributed Computing
印刷版ISSN:2005-4262
出版年度:2014
卷号:7
期号:4
页码:149-156
DOI:10.14257/ijgdc.2014.7.4.14
出版社:SERSC
摘要:Adopting focused crawler to search web sites is the trend of next generation search engines. Design and implementation of a focused crawler - TargetCrawler is introduced in detail, including its overall architecture, main modules, working processes and two key algorithms, duplicate removing algorithm based on the Bloom filter and ranking algorithm based on priority which are designed to ensure accuracy and efficiency of web search. Experimental results show the effectiveness of the scheme.