文章基本信息

标题：音声と画像の統合によるドライバの発話区間検出
本地全文：下载
作者：二宮芳樹 ; 坂義秀 ; 前野俊希等
期刊名称：映像情報メディア学会誌
印刷版ISSN：1342-6907
电子版ISSN：1881-6908
出版年度：2008
卷号：62
期号：3
页码：435-441
DOI：10.3169/itej.62.435
出版社：The Institute of Image Information and Television Engineers
摘要：Voice activity detection is an important part of the development of speech functions for on-board car navigation and assistance systems. It is difficult to detect voice activity using only sound information in a vehicle environment that has a wide variety of sounds and noises. We propose an suitable image feature and integration method that can be used to develop a robust bimodal voice activity detection (VAD) systems using a driver's voice and facial images. We select the normal correlation value between sequential mouth images and the number of low-intensity pixels in mouth image, which we then used as the feature for VAD. We propose a system in which the discrimination function consist of the sum of weighted singles feature discrimination functions and combinations of logical addition and multiplication of singles feature discrimination functions. The experimental results show that the proposed sound and image features can be useful and that the proposed integration method has a 97% hit rate, which is 9 points better than the previous integration method at the point that false alarm rate is about 12%.
关键词：発話区間検出;ドライバ;特徴統合;バイモーダルインタフェース;口唇画像;音声認識