期刊名称:TELKOMNIKA (Telecommunication Computing Electronics and Control)
印刷版ISSN:2302-9293
出版年度:2014
卷号:12
期号:4
页码:1105-1112
DOI:10.12928/telkomnika.v12i4.981
语种:English
出版社:Universitas Ahmad Dahlan
摘要:The large size and the dynamic nature of the Web make it necessary to continually maintain Web based information retrieval systems. In order to get more objects by visiting few irrelevant web pages, the web crawler usually takes the heuristic searching strategy that ranks urls by their importance and preferentially visits the more important web pages. While some systems rely on crawlers that exhaustively crawl the Web, others incorporate “focus” within their crawlers to harvest application or topic-specific collections. In this paper, using the Hidden Markov Model(HMM) learning ability to solve the problem of the theme of the crawler drift, has obtained the certain effect.