首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:HMM-Based Dari Named Entity Recognition for Information Extraction
  • 本地全文:下载
  • 作者:Ghezal Ahmad Jan Zia ; Ahmad Zia Sharifi
  • 期刊名称:Computer Science & Information Technology
  • 电子版ISSN:2231-5403
  • 出版年度:2019
  • 卷号:9
  • 期号:7
  • 页码:1-9
  • DOI:10.5121/csit.2019.90706
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:Named Entity Recognition (NER) is the fundamental subtask of information extraction systems that labels elements into categories such as persons, organizations or locations. The task of NER is to detect and classify words that are parts of sentences. This paper describes a statistical approach to modeling NER in Dari language. Dari and Pashto are low resources languages, spoken as official languages in Afghanistan. Unlike other languages, named entity detection approaches differ in Dari. Since in Dari language there is no capitalization for identifying named entities. We seek to bridge the gap between Dari linguistic structure and supervised learning model that predict the sequences of words paired with a sequence of tags as outputs. Dari corpus was developed from the collection of news, reports and articles based on the original orthographic structure of the Dari language. The experimental result of named entity recognition performance presents 94% accuracy.
  • 关键词:Natural Language Processing (NLP); Hidden Markov Model (HMM); Named Entity Recognition (NER); Part;of;Speech (POS) Tagging;
国家哲学社会科学文献中心版权所有