首页    期刊浏览 2025年07月12日 星期六
登录注册

文章基本信息

  • 标题:An Arabic Lemma-Based Stemmer for Latent Topic Modeling
  • 本地全文:下载
  • 作者:Abderrezak Brahmi ; Ahmed Ech-Cherif ; Abdelkader Benyettou
  • 期刊名称:The International Arab Journal of Information Technology
  • 印刷版ISSN:1683-3198
  • 出版年度:2013
  • 卷号:10
  • 期号:2
  • 出版社:Zarqa Private University
  • 摘要:Developments in Arabic information retrieval did not follow the increasing use of the Arabic Web during the last decade. Semantic indexing in a language with high inflectional morphology, such as Arabic, is not a trivial task and requires a text analysis in the original language. Excepting cross-language retrieval methods or limited studies, the main efforts, for developing semantic analysis methods and topic modeling, did not include Arabic text. This paper describes our approach for analyzing semantics in Arabic texts. A new lemma-based stemmer is developed and compared to root-based one for characterizing Arabic text. The Latent Dirichlet Allocation (LDA) model is adapted to extract Arabic latent topics from various real-world corpora. In addition to the interesting subjects discovered in the press articles during the 2007-2009 period, experiments show that the classification performances with lemma-based stemming in the topics space, are improved when comparing to classification with root-based stemming.
  • 关键词:Arabic stemming; topic model; semantic analysis; classification; test collection
国家哲学社会科学文献中心版权所有