文章基本信息

标题：Improving Punjabi Part of Speech Tagger by Using Reduced Tag Set
本地全文：下载
作者：Manjit Kaur ; Mehak Aggarwal ; Sanjeev Kumar Sharma 等
期刊名称：International Journal of Computer Applications and Information Technology
印刷版ISSN：2278-7720
出版年度：2014
卷号：7
期号：2
页码：142-148
出版社：Mahadev Educational Society
摘要：POS tagging is the fundamental task in almost all the NLP applications Like Grammar Checking, Speech processing,Machine translation etc. that assign the correct tag to the word for a number of ava ilable tags. The accuracy of a tagger is the biggest challenge today. A Lot of tagger have been proposed by different Researchers for the different languages (Punjabi, Hindi, Bengali etc.) using different techniques like HMM (Hidden Markov Model), SVM (Support Vector Machine), ME (Maximum Entropy) etc. A Punjabi POS tagger based on HMM model is one of them [1 ] This tagger uses Hidden Markov Model., a statistical technique to accurately tag the words in Punjabi language using 630 tags developed by Mandeep Singh and G S Lehal [2] .This large tag set (630 tags)results in data sparseness problem. To cope up with this problem, in this research paper an experiment with reduced POS tag set (36 tags) proposed by Technical Development of Indian Languages (TDIL) has been used to improve the tagging accuracy of HMM based POS tagger. Finally the result has been manually evaluated from a linguistic person.
关键词：Parts-of-speech tagger; Punjabi; HMM technique; TDIL proposed Punjabi tag set