期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2016
卷号:7
期号:4
页码:153-155
语种:English
出版社:Ayushmaan Technologies
摘要:Developing better methods for finding root words is important for improving the processing of Indian languages. In this paper we discuss the language-dependent and independent approaches for Telugu, an Indian language. Automatic information processing and retrieval in local languages, is therefore becoming an urgent need in the Indian context. Moreover, since India is a multilingual country, Telugu is the third most spoken language in India and one of the fifteen most spoken languages in the world. It is the official language of the states of Telangana and Andhra Pradesh. There is an also a vast increase in Telugu languages text documents. Because of the complexity of Telugu language, we propose three methods for finding the root words for a given document. They are pseudo N gramming is a language independent and other are vibhaktulu based stemming and suffix removal stemming ,which are language dependent models. Rule based pseudo N-gramming is a hybrid model. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on Telugu document retrieval.
关键词:N-gramming;Pseudo N-gramming;Rule based pseudo N-gramming;Telugu;Vibhaktulu based stemming