期刊名称:Journal of Emerging Technologies in Web Intelligence
印刷版ISSN:1798-0461
出版年度:2013
卷号:5
期号:4
页码:401-406
DOI:10.4304/jetwi.5.4.401-406
语种:English
出版社:Academy Publisher
摘要:Today, size of the web is exceptionally large. And this size is increasing rapidly. Huge number of web pages and web sites are being added each day. Hence, results which are effective, factual and authentic are needed. A simple crawler cannot cover each web page as it would take polynomial time to do so. In order to overcome such issues, this paper proposes an algorithm to develop an efficient, focused, domain specific crawler using LSI (Latent Semantic Indexing). This algorithm makes the crawler highly efficient in downloading relevant documents, thus, avoiding over-heads and resource wastage, and also increases the precision and recall values of the IR system developed on it.
关键词:Crawling;focused crawler;latent semantic indexing;domain specific crawler.