文章基本信息

标题：A Structured Information Extraction Algorithm for Scientific Papers based on Feature Rules Learning
本地全文：下载
作者：Chen, Jianguo ; Chen, Hao
期刊名称：Journal of Software
印刷版ISSN：1796-217X
出版年度：2013
卷号：8
期号：1
页码：55-62
DOI：10.4304/jsw.8.1.55-62
语种：English
出版社：Academy Publisher
摘要：Most traditional scientific papers are unstructured documents, which are difficult to meet the requirement of structured retrieval, statistical classification and association analysis and other high-level application, how to extract and analyze the structured information of the papers becomes a challenging problem. A structured information extraction algorithm for unstructured and/or semi-structured machine-readable documents is proposed, it according to the extracted rules after feature learning on the basis of analyzing the basic structure and format features of traditional scientific papers, which extracts the title, author, abstract, keywords, text and other elements of paper from the unstructured documents, such as Word, then exports the structured text from the traditional scientific papers with the format required by multi-dimensional scientific papers, it can meet the requirements of structured retrieval, statistical classification and other high-level applications of scientific papers.
关键词：Information Extraction;Feature Rules;Multi-dimensional scientific Papers