首页    期刊浏览 2024年07月05日 星期五
登录注册

文章基本信息

  • 标题:An Approach of Information Extraction Based on Dom Tree and Weight Value
  • 本地全文:下载
  • 作者:Haitao Wang ; Shufen Liu
  • 期刊名称:International Journal of Grid and Distributed Computing
  • 印刷版ISSN:2005-4262
  • 出版年度:2016
  • 卷号:9
  • 期号:10
  • 页码:311-320
  • 出版社:SERSC
  • 摘要:Eliminating noisy information and extracting information content from web pages are increasing to become an important research issue in information retrieval field. In this paper, we present an approach of information extraction based on Dom tree and weight value calculation, which contains the following steps, parse the web page to construct the Dom tree, extract the title and keywords, calculate the weight value and obtain the content. The experimental result shows that this method has the higher accuracy ratio by the various themes content extraction.
  • 关键词:Information extraction; Dom tree; Weight value; JSoup; Web pages
国家哲学社会科学文献中心版权所有