期刊名称:International Journal of Computer Information Systems and Industrial Management Applications
印刷版ISSN:2150-7988
电子版ISSN:2150-7988
出版年度:2010
卷号:2
页码:179-186
出版社:Machine Intelligence Research Labs (MIR Labs)
摘要:Telugu is an Indian language spoken by more than 50million people in the country. Language is very rich inliterature, and it requires advancements incomputational approaches. Applications like machinetranslation, speech recognition, speech synthesis andinformation retrieval need a powerful morphologicalgenerator to give morphological forms of nouns andverbs. The existing Telugu morphological analyzer(TMA) is rule based, the performance of it is furtherimproved by our Novel approach which provides anUnsupervised Stemmer that gives information aboutpossible decompositions of the word inflected by manymorphemes. Using these possible decompositions theroot word could be extracted for those words which wereinitially not recognized by rule based morphologicalanalyzer. The experiment is conducted on CII Telugucorpus and the improvement in the performance ischecked by the rule based morphological analyzerdeveloped by LTRC group. In this present work wepresent an unsupervised stemmer for improving theperformance of Telugu rule based morph analyzer. Themain advantage is, increase in performance of rulebased from 77% to 84.2% for words which are inhundreds. It can still be improved if the corpus isincreased.