首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:DESIGN AND DEVELOPMENT OF DICTIONARY-BASED STEMMER FOR THE URDU LANGUAGE
  • 本地全文:下载
  • 作者:ZAHID HUSSAIN ; SAJID IQBAL ; TANZILA SABA
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2017
  • 卷号:95
  • 期号:15
  • 页码:3560
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Stemming reduces numerous variant forms of a word to its base, stem or root form which is essential for different language processing applications including Urdu IR. Urdu is a resource poor and morphologically rich language. Multilingual Urdu vocabulary is very challenging to process due to its complex morphology. Research of Urdu stemming has an age of a decade. However, there has not been any work reported on dictionary based Urdu stemming. The present work introduces a dictionary based Urdu stemmer with improved performance as compared to the existing Urdu stemmers. The significance of the study is the identification of dictionary-based approach for Urdu stemming as the most promising approach, especially with dictionary update feature. Testing shows 94.85% overall accuracy on test data and results can be further improved by cleaning test data and dictionary updates.
  • 关键词:Dictionary based stemming; dictionary updates; infixes; Fused classification
国家哲学社会科学文献中心版权所有