期刊名称:Indian Journal of Computer Science and Engineering
印刷版ISSN:2231-3850
电子版ISSN:0976-5166
出版年度:2020
卷号:11
期号:5
页码:510-516
DOI:10.21817/indjcse/2020/v11i5/201105168
出版社:Engg Journals Publications
摘要:In the present modern digital era, use of social media has been increasing exponentially. People have started using short text for expressing their thoughts. Social media websites like Twitter, Facebook are generating vast amount of short text at every second that reveals good knowledge of real time information. Extensive research is going on to discover knowledge from it. Short text is very sparse and ambiguous; hence there is a big challenge to find latent topics from it. This can be resolved by using unsupervised machine learning approach referred as topic modeling. This paper covers various topic modeling methods like Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF), and Semantics-assisted Non-negative Matrix Factorization (SeaNMF) and their comparative analysis. These three methods have been tested on ABCNews headline dataset, results have been analyzed using average Normalized Google Distance (NGD) score; which is 67.88%, 58.60%, 59.32% for SeaNMF, NMF and LDA respectively. The quantitative result shows that more meaningful and semantically similar words are clustered under each topic by SeaNMF model.