期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2014
卷号:5
期号:5
DOI:10.14569/IJACSA.2014.050518
出版社:Science and Information Society (SAI)
摘要:Splitting is a conventional process in most of Indian languages according to their grammar rules. It is called ‘pada vicchEdanam’ (a Sanskrit term for word splitting) and is widely used by most of the Indian languages. Splitting plays a key role in Machine Translation (MT) particularly when the source language (SL) is an Indian language. Though this splitting may not succeed completely in extracting the root words of which the compound is formed, but it shows considerable impact in Natural Language Processing (NLP) as an important phase. Though there are many types of splitting, this paper considers only consonant based and phrase based splitting.
关键词:thesai; IJACSA; thesai.org; journal; IJACSA papers; Bigram; n-gram; consonant based splitting; phrase based splitting