期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2012
卷号:9
期号:4
出版社:IJCSI Press
摘要:Hindi being a highly inflectional language, FST (Finite State Transducer) based approach is most efficient for developing a morphological analyzer for this language. The work presented in this paper uses the SFST (Stuttgart Finite State Transducer) tool for generating the FST. A lexicon of root words is created. Rules are then added for generating inflectional and derivational words from these root words. The Morph Analyzer developed was used in a Part Of Speech (POS) Tagger based on Stanford POS Tagger. The system was first trained using a manually tagged corpus and MAXENT (Maximum Entropy) approach of Stanford POS tagger was then used for tagging input sentences. The morphological analyzer gives approximately 97% correct results. POS tagger gives an accuracy of approximately 87% for the sentences that have the words known to the trained model file, and 80% accuracy for the sentences that have the words unknown to the trained model file.
关键词:Morphological Analyzer; Finite State Transducer; POS Tagger; Lexicon Generator.