首页    期刊浏览 2025年06月18日 星期三
登录注册

文章基本信息

  • 标题:SHORT TEXT TOPIC MODELING WITH EMPIRICAL LEARNING
  • 本地全文:下载
  • 作者:Supriya A. Kinariwala ; Sachin N. Deshmukh
  • 期刊名称:Indian Journal of Computer Science and Engineering
  • 印刷版ISSN:2231-3850
  • 电子版ISSN:0976-5166
  • 出版年度:2020
  • 卷号:11
  • 期号:5
  • 页码:510-516
  • DOI:10.21817/indjcse/2020/v11i5/201105168
  • 出版社:Engg Journals Publications
  • 摘要:In the present modern digital era, use of social media has been increasing exponentially. People have started using short text for expressing their thoughts. Social media websites like Twitter, Facebook are generating vast amount of short text at every second that reveals good knowledge of real time information. Extensive research is going on to discover knowledge from it. Short text is very sparse and ambiguous; hence there is a big challenge to find latent topics from it. This can be resolved by using unsupervised machine learning approach referred as topic modeling. This paper covers various topic modeling methods like Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and Semantics-assisted Non-negative Matrix Factorization (SeaNMF) and their comparative analysis. These three methods have been tested on ABCNews headline dataset, results have been analyzed using average Normalized Google Distance (NGD) score; which is 67.88%, 58.60%, 59.32% for SeaNMF, NMF and LDA respectively. The quantitative result shows that more meaningful and semantically similar words are clustered under each topic by SeaNMF model.
  • 关键词:Topic Modeling;Short text;Latent Dirichlet Allocation;Non-negative Matrix factorization;Semantic assisted NMF.
国家哲学社会科学文献中心版权所有