首页    期刊浏览 2024年11月25日 星期一
登录注册

文章基本信息

  • 标题:NLP AND IR BASED SOLUTION FOR CONFIRMING CLASSIFICATION OF RESEARCH PAPERS
  • 本地全文:下载
  • 作者:KHALID M.O. NAHAR ; NOUH ALHINDAWI ; OBAIDA M. AL-HAZAIMEH
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2018
  • 卷号:96
  • 期号:16
  • 出版社:Journal of Theoretical and Applied
  • 摘要:In this paper, an approach is presented for classifying and categorizing the research�s papers in very accurate manner. Typically, the papers are classified into clusters based on the concepts and the contents, this clustering process is mainly depends on the title of the paper. However, a lot of papers have ambiguous title or have a very short title. Therefore, the researcher needs to cluster and classify the papers not just depending on the title, but also include other parts of the paper like: abstract, keywords, and may be some key parts of the paper. This process is time consuming since the researchers spend a lot of time to decide the related cluster of the undertaken paper. Our presented approach provides an automatic, short time, and accurate solution, which mainly depends on Information Retrieval (IR) as core process along with some Natural Language Processing (NLP) techniques. Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) are the two IR algorithms which used in the new approach. We use the LDA for classifying the papers using the concept of topic modeling. And we use the LSI for performing querying. The new approach uses the title of the paper, the abstract, and the keyword for performing the classification process. Two distinct experiments were conducted over 600 papers in the field of computer science. The results show the efficiency of the proposed approach in classifying and mapping the papers accurately and efficiently.
  • 关键词:NLP and Information Retrieval (IR); Classification; Topic Modeling; Latent Dirichlet Allocation; Latent Semantic Indexing; Gensim
国家哲学社会科学文献中心版权所有