首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Improving Punjabi Part of Speech Tagger by Using Reduced Tag Set
  • 本地全文:下载
  • 作者:Manjit Kaur ; Mehak Aggarwal ; Sanjeev Kumar Sharma
  • 期刊名称:International Journal of Computer Applications and Information Technology
  • 印刷版ISSN:2278-7720
  • 出版年度:2014
  • 卷号:7
  • 期号:2
  • 页码:142-148
  • 出版社:Mahadev Educational Society
  • 摘要:POS tagging is the fundamental task in almost all the NLP applications Like Grammar Checking, Speech processing,Machine translation etc. that assign the correct tag to the word for a number of ava ilable tags. The accuracy of a tagger is the biggest challenge today. A Lot of tagger have been proposed by different Researchers for the different languages (Punjabi, Hindi, Bengali etc.) using different techniques like HMM (Hidden Markov Model), SVM (Support Vector Machine), ME (Maximum Entropy) etc. A Punjabi POS tagger based on HMM model is one of them [1 ] This tagger uses Hidden Markov Model., a statistical technique to accurately tag the words in Punjabi language using 630 tags developed by Mandeep Singh and G S Lehal [2] .This large tag set (630 tags)results in data sparseness problem. To cope up with this problem, in this research paper an experiment with reduced POS tag set (36 tags) proposed by Technical Development of Indian Languages (TDIL) has been used to improve the tagging accuracy of HMM based POS tagger. Finally the result has been manually evaluated from a linguistic person.
  • 关键词:Parts-of-speech tagger; Punjabi; HMM technique; TDIL proposed Punjabi tag set
国家哲学社会科学文献中心版权所有