首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Topic Information Collection based on the Hidden Markov Model
  • 本地全文:下载
  • 作者:Jiang, Hai-yan ; Wang, Xing-ce ; Wu, Zhong-ke
  • 期刊名称:Journal of Networks
  • 印刷版ISSN:1796-2056
  • 出版年度:2013
  • 卷号:8
  • 期号:2
  • 页码:485-492
  • DOI:10.4304/jnw.8.2.485-492
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:Specific-subject oriented information collection is one of the key technologies of vertical search engines, which directly affects the speed and relevance of search results. The topic information collection algorithm is widely used for its accuracy. The Hidden Markov Model (HMM) is used to learn and judge the relevance between the Uniform Resource Locator (URL) and the topic information. The Rocchio method is used to construct the prototype vectors relevant to the topic information, and the HMM is used to learn the preferred browsing paths. The concept maps including the semantics of the webpage are constructed and the web's link structures can be decided. The validity of the algorithm is proved by the experiment at last. Comparing with the Best-First algorithm, this algorithm can get more information pages and has higher precision ratio.
  • 关键词:Topic Information Collection;Hidden Markov Model;Crawler;URL (Uniform Resource Locator);Prototype Vector;;Precision Ratio;Recall Ratio
国家哲学社会科学文献中心版权所有