文章基本信息

标题：A New Approach to Parts of Speech Tagging in Malayalam
本地全文：下载
作者：D.Muhammad Noorul Mubarak ; Sareesh Madhu ; S A Shanavas 等
期刊名称：International Journal of Computer Science & Information Technology (IJCSIT)
印刷版ISSN：0975-4660
电子版ISSN：0975-3826
出版年度：2015
卷号：7
期号：5
页码：121
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’susage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimesinclude additional information, with case markers (number, gender etc) and tense markers. A large numberof current language processing systems use a parts-of-speech tagger for pre-processing.There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule basedApproach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is theoldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic andstatistical information to assign tag to words. It use large corpus, so that Time complexity and Spacecomplexity is high whereas Rule base approach has less complexity for both Time and Space. StochasticApproach is the widely used one nowadays because of its accuracy.Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. Thecurrently used Algorithms are efficient Machine Learning Algorithms but these are not built forMalayalam. So it affects the accuracy of the result of Malayalam POS Tagging.My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm useMultithreaded Technology. Here tagging done with the probability of the occurrence of the sentencestructure along with the dictionary entry.
关键词：NLP; POS tagger; Rule based approach; Stochastic approach; Multithreading; Dictionary entry;Malayalam