首页    期刊浏览 2025年02月17日 星期一
登录注册

文章基本信息

  • 标题:GPLDA: A Generalized Poisson Latent Dirichlet Topic Model
  • 本地全文:下载
  • 作者:Ibrahim Bakari Bala ; Mohd Zainuri Saringat
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2019
  • 卷号:10
  • 期号:12
  • DOI:10.14569/IJACSA.2019.0101253
  • 出版社:Science and Information Society (SAI)
  • 摘要:The earliest modification of Latent Dirichlet Allocation (LDA) in terms of words or document attributes is by relaxing its exchangeability assumption via the Bag-of-word (BoW) matrix. Several authors have proposed many modifications of the original LDA by focusing on model that assumes the current topic depends on the words from previous topic. Most of the earlier work ignored the document length distribution since it is assumed that it will fizzle out at the modelling stage. Thus, in this paper, the Poisson document length distribution of LDA model is replaced with Generalized Poisson (GP) distribution which has the strength of capturing complex structures. The main strengths of GP are in capturing overdispersed (variance larger than mean) and under dispersed (variance smaller than mean) count data. The Poisson distribution used by LDA strongly relies on the assumption that the mean and variance of document lengths are equal. This assumption is often unrealistic with most real-life text data where the variance of document length may be greater than or less than their mean. Approximate estimate of the GPLDA model parameters was achieved using Newton-Raphson approximation technique of log-likelihood. Performance and comparative analysis of GPLDA with LDA using accuracy and F1 showed improved results.
  • 关键词:Bag-of-word; generalized Poisson distribution; topic model; latent Dirichlet allocation
国家哲学社会科学文献中心版权所有