文章基本信息

标题：Bimodal Speech Recognition: A Review
本地全文：下载
作者：Priyanka Varshney ; Prashant Upadhyaya ; Omar Farooq 等
期刊名称：International Journal of Electronics and Computer Science Engineering
电子版ISSN：2277-1956
出版年度：2012
卷号：1
期号：3
页码：892-895
出版社：Buldanshahr : IJECSE
摘要：Visual information along with audio is important for human machine interface. It not only increases the accuracy of an Audio Speech Recognition (ASR) but also improves its robustness. This paper presents an overview of different approaches used for viseme recognition and also reports the new results for Hindi viseme recognition. The visemes were extracted from a database prepared from continuous sentences uttered by 5 native Hindi speakers. For audio features mel frequency cepstral coefficients (MFCCs) were used while discrete wavelet transform (DWT) followed by discrete cosine transform (DCT) was used for visual feature extraction. The features extracted were then given to discriminant function based classifier. The maximum improvement in the recognition performance of 10.72 % is achieved at -5 dB signals to noise ratio (SNR).
关键词：Speech Recognition; Human Computer Inter face; Discrete Cosine Transform (DCT); Mel ;Frequency Cepstral Coefficient (MFCC)