期刊名称:International Journal of Computer Science and Network
印刷版ISSN:2277-5420
出版年度:2013
卷号:2
期号:2
出版社:IJCSN publisher
摘要:Automatic Speech Recognition (ASR) is an essential componentin many Human-Computer Interaction systems. A variety ofapplications in the field of ASR have reached high performancelevels but only for condition-controlled environments. In thisproject, we reduce the noise in the video lectures using bi-modalfeature extraction. Audio signal features need to be enhancedwith additional sources of complementary information toovercome problems due to large amounts of acoustic noise.Visual Information extracted from speaker’s mouth region seemsto be promising and appropriate for giving audio-onlyrecognition a boost. Lip/Mouth detection and tracking combinedwith traditional Image Processing methods may offer a variety ofsolutions for the construction of the visual front-end schema.Furthermore, Audio and Visual stream fusion appears to be evenmore challenging and crucial for designing an efficient AVRecognizer. In this project, we investigate some problems in thefield of Audio-Visual Automatic Speech Recognition (AV-ASR)concerning visual feature extraction and audio-visual integrationto reduce noise in the video lectures