首页    期刊浏览 2024年07月09日 星期二
登录注册

文章基本信息

  • 标题:Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
  • 本地全文:下载
  • 作者:Eshete Derb Emiru ; Shengwu Xiong ; Yaxing Li
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2021
  • 卷号:12
  • 期号:2
  • 页码:62
  • DOI:10.3390/info12020062
  • 出版社:MDPI Publishing
  • 摘要:Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.
  • 关键词:Amharic; automatic speech recognition; connectionist temporal classification with attention; natural language processing; low resource language; out-of-vocabulary Amharic ; automatic speech recognition ; connectionist temporal classification with attention ; natural language processing ; low resource language ; out-of-vocabulary
国家哲学社会科学文献中心版权所有