首页    期刊浏览 2025年06月28日 星期六
登录注册

文章基本信息

  • 标题:UNSUPERVISED KEYWORD EXTRACTION USING NON-SMOOTH NMF
  • 本地全文:下载
  • 作者:ALIYA NUGUMANOVA ; DARKHAN AHMED-ZAKI ; MADINA MANSUROVA
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2020
  • 卷号:98
  • 期号:22
  • 页码:3583-3596
  • 出版社:Journal of Theoretical and Applied
  • 摘要:In this paper, we introduce a novel unsupervised method for keyword extraction, based on non-smooth nonnegative matrix factorization. We generate a document-term matrix from a given corpus and factorize it into the product of two special matrices: documents-by-topics and topics-by-terms. In our method, we choose a low degree of factorization (k=3,4,5) and use only topics-by-terms matrix to extract top N keywords for each of k topics. Then we merge these obtained N*k keywords into a resulting keyword list excluding duplicates and assign keywords to documents. We validate our method with a large text corpora: �Introduction to information retrieval� textbook (by Manning, Raghavan and Sch�tze), available online. The result of our method is compared with three popular unsupervised keyword extraction algorithms: TextRank, Rake and Yake. The experiments confirm that the proposed method shows the promising performance in terms of precision, recall and F-measure with respect to various number of candidate keywords.
  • 关键词:Keyword Extraction;NMF;nsNMF;NLP;Unsupervised Approach.
国家哲学社会科学文献中心版权所有