首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:A Structured Information Extraction Algorithm for Scientific Papers based on Feature Rules Learning
  • 本地全文:下载
  • 作者:Chen, Jianguo ; Chen, Hao
  • 期刊名称:Journal of Software
  • 印刷版ISSN:1796-217X
  • 出版年度:2013
  • 卷号:8
  • 期号:1
  • 页码:55-62
  • DOI:10.4304/jsw.8.1.55-62
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:Most traditional scientific papers are unstructured documents, which are difficult to meet the requirement of structured retrieval, statistical classification and association analysis and other high-level application, how to extract and analyze the structured information of the papers becomes a challenging problem. A structured information extraction algorithm for unstructured and/or semi-structured machine-readable documents is proposed, it according to the extracted rules after feature learning on the basis of analyzing the basic structure and format features of traditional scientific papers, which extracts the title, author, abstract, keywords, text and other elements of paper from the unstructured documents, such as Word, then exports the structured text from the traditional scientific papers with the format required by multi-dimensional scientific papers, it can meet the requirements of structured retrieval, statistical classification and other high-level applications of scientific papers.
  • 关键词:Information Extraction;Feature Rules;Multi-dimensional scientific Papers
国家哲学社会科学文献中心版权所有