文章基本信息

标题：Audio-Visual Based Multi-Sample Fusion to Enhance Correlation Filters Speaker Verification System
本地全文：下载
作者：Dzati Athiar Ramli ; Salina Abdul Samad ; Aini Hussain 等
期刊名称：International Journal on Computer Science and Engineering
印刷版ISSN：2229-5631
电子版ISSN：0975-3397
出版年度：2010
卷号：2
期号：4
页码：1286-1294
出版社：Engg Journals Publications
摘要：In this study, we propose a novel approach for speaker verification system that uses a spectrogram image as features and Unconstrained Minimum Average Correlation Energy (UMACE) filters as classifiers. Since speech signal is a behavioral signal, the speech data has a tendency not to consistently reproduce due to the change of speaking rates, health, emotional conditions, temperature and humidity. In order to overcome this problem, a modification of UMACE filters architecture is proposed by executing a multi-sample fusion using speech and lipreading data. So as to evaluate the outstanding fusion scheme, five multi-sample fusion strategies, i.e. maximum, minimum, median, average and majority vote are first experimented using the speech signal data. Afterward, the performance of the audio-visual system using the enhanced UMACE filters is then tested. Here, lipreading data is combined to the audio samples pool and the outstanding fusion scheme that found in prior experiment is used as multi-sample fusion scheme. The Digit Database had been used for performance evaluation and the performance up to 99.64% is achieved by using the enhanced UMACE filters for the speech only system which is 6.89% improvement compared with the base line approach. Subsequently, the implementation of the audio-visual system is observed to be significant in order to broaden the PSR score interval between the authentic and imposter data as well as to further improve the performance of audio only system that offer toward a robust verification system.
关键词：multi-sample fusion; correlation filter;spectrographic image; lipreading; speaker verification.