首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:PROBABILISTIC ARABIC PART OF SPEECH TAGGER WITH UNKNOWN WORDS HANDLING
  • 本地全文:下载
  • 作者:Mohammed Albared ; Tareq Al-Moslmi ; Nazlia Omar
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2016
  • 卷号:90
  • 期号:2
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Part Of Speech (POS) tagger is an essential preprocessing step in many natural language applications. In this paper, we investigate the best configuration of trigram Hidden Markov Model (HMM) Arabic POS tagger when small tagged corpus is available. With small training data, unknown word POS guessing is the main problem. This problem becomes more serious in languages which have huge size of vocabulary and rich and complex morphology like Arabic. In order to handle this problem in Arabic POS tagger, we have studied the effect of integrating a lexicon based morphological analyzer to improve the performance of the tagger. Moreover, in this work, several lexical models have been empirically defined, implemented and evaluated. These models are based essentially on the internal structure and the formation process of Arabic words. Furthermore, several combinations of these models have been presented. The POS tagger has been trained with a training corpus of 29300 words and it uses a tagset of 24 different POS tags. Our system achieves state-of-the-art overall accuracy in Arabic part of speech tagging and outperforms other Arabic taggers in unknown word POS tagging accuracy.
  • 关键词:Part of Speech Tagger; Arabic Language; Unknown Word Guessing.
国家哲学社会科学文献中心版权所有