期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2014
卷号:9
期号:8
页码:251-260
DOI:10.14257/ijmue.2014.9.8.22
出版社:SERSC
摘要:Web crawling is an important approach for collecting larger-scale web data on, and keeping up with, the rapidly expanding Internet. This paper puts forward the improved shark search approach for crawling large-scale Web data based on link clustering and the technology of tunnel. In this study we focus on the classification of Web links instead of downloaded web pages to determine relevancy which can avoid local optimum of the traditional shark search algorithm. The experiments show that the improve shark search algorithm can provide the simplest alternative for conquering the issue of instantaneous page which are ranked lowly allied to the given topic at hand.