文章基本信息

标题：A Two-Stage Method for Scientific Papers Analysis
本地全文：下载
作者：Hanyurwimfura, Damien ; Liao, Bo ; Masabo, Emmanuel 等
期刊名称：Journal of Software
印刷版ISSN：1796-217X
出版年度：2014
卷号：9
期号：10
页码：2564-2573
DOI：10.4304/jsw.9.10.2564-2573
语种：English
出版社：Academy Publisher
摘要：A considerable amount of research is being conducted by many people (researchers, graduate students, professors etc) everyday. Finding information about a specific topic is one of the most time consuming activities of those people. People doing research have to search, read and analyze multiple research papers, e-books and other documents and then determine what they contain and discover knowledge from them. Many available resources are in the form of unstructured text format of long text pages which require long time to read and analyze. In this paper we propose a two-stage method for scientific paper analysis. The method uses information extraction to extract the main idea key sentences (mainly needed by the most readers) from the paper and the extracted paper’s information is then organized in a structured format and grouped in different clusters according to their topics using a multi-word based clustering method. The proposed method combines different features in paper’s topics extraction and uses multi-word matching feature in selection of initial centroids for clustering. The proposed method can help readers to access and analyze multiple research papers documents timely and efficiently. Conducted experiments show the effectiveness and usefulness of our proposed approach.
关键词：text mining;information extraction;text clustering;important information;initial centroids;scientific papers.