文章基本信息

标题：Topic Information Collection based on the Hidden Markov Model
本地全文：下载
作者：Jiang, Hai-yan ; Wang, Xing-ce ; Wu, Zhong-ke 等
期刊名称：Journal of Networks
印刷版ISSN：1796-2056
出版年度：2013
卷号：8
期号：2
页码：485-492
DOI：10.4304/jnw.8.2.485-492
语种：English
出版社：Academy Publisher
摘要：Specific-subject oriented information collection is one of the key technologies of vertical search engines, which directly affects the speed and relevance of search results. The topic information collection algorithm is widely used for its accuracy. The Hidden Markov Model (HMM) is used to learn and judge the relevance between the Uniform Resource Locator (URL) and the topic information. The Rocchio method is used to construct the prototype vectors relevant to the topic information, and the HMM is used to learn the preferred browsing paths. The concept maps including the semantics of the webpage are constructed and the web's link structures can be decided. The validity of the algorithm is proved by the experiment at last. Comparing with the Best-First algorithm, this algorithm can get more information pages and has higher precision ratio.
关键词：Topic Information Collection;Hidden Markov Model;Crawler;URL (Uniform Resource Locator);Prototype Vector;;Precision Ratio;Recall Ratio