摘要:Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.
其他摘要:Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.