首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Bimodal Speech Recognition: A Review
  • 本地全文:下载
  • 作者:Priyanka Varshney ; Prashant Upadhyaya ; Omar Farooq
  • 期刊名称:International Journal of Electronics and Computer Science Engineering
  • 电子版ISSN:2277-1956
  • 出版年度:2012
  • 卷号:1
  • 期号:3
  • 页码:892-895
  • 出版社:Buldanshahr : IJECSE
  • 摘要:Visual information along with audio is important for human machine interface. It not only increases the accuracy of an Audio Speech Recognition (ASR) but also improves its robustness. This paper presents an overview of different approaches used for viseme recognition and also reports the new results for Hindi viseme recognition. The visemes were extracted from a database prepared from continuous sentences uttered by 5 native Hindi speakers. For audio features mel frequency cepstral coefficients (MFCCs) were used while discrete wavelet transform (DWT) followed by discrete cosine transform (DCT) was used for visual feature extraction. The features extracted were then given to discriminant function based classifier. The maximum improvement in the recognition performance of 10.72 % is achieved at -5 dB signals to noise ratio (SNR).
  • 关键词:Speech Recognition; Human Computer Inter face; Discrete Cosine Transform (DCT); Mel ;Frequency Cepstral Coefficient (MFCC)
国家哲学社会科学文献中心版权所有