期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2013
卷号:3
期号:10
出版社:S.S. Mishra
摘要:Building a speech recognition system for any language requires large training data to improve the efficiency of recognition. Collecting huge training data for every language is trivial task. For the languag es that are syllabic centred, problem can be solved by proper analysis of text data and arriving at minimum words that would cover all possible syllables. These words are collected as training data and can be used to build robust HMM models. This paper highlights the importance of a syllable unit and the results relating to the statistical analysis carried on the CIIL text corpus for Telugu language. The results obtained are used to analyse the possible position of occurrence of each syllable in the words. This analysis is used in generating minimum training data set that covers maximum vocabulary. The minimum data set is generated by including high frequent syllables based on the position of occurrence in the words
关键词:Syllable Structure; Position of Occurrence; Minimum data set; Word Coverage and High frequent ;syllables.