期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2012
卷号:36
期号:2
页码:206-216
出版社:Journal of Theoretical and Applied
摘要:A major problem of most speaker identification systems is their unsatisfactory robustness in noisy environments. The performance of automatic speaker identification systems degrade drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. In this experimental research we have studied a recently robust front-end algorithm based on Gammatone Frequency Cepstral Coefficients GFCC associated to Voice Activity Detector VAD and Cepstral Mean Normalization CMN techniques. Our system using a Gaussian Mixture Models GMM classifier are implemented under MATLAB�7 programming environment. An Expectation Maximization EM algorithm was used to maximize the sum of Gaussian densities until convergence was reached. Evaluation is carried out on our own database containing 51 mixed Arabic speakers. All test utterances are corrupted by a multilevel White Gaussian Noise WGN. Our aim is to study the performances of the suggested architecture and make a comparison with the conventional Mel Frequency Cepstral Coefficients MFCC method which we have successfully implemented and tested in the previous work. The obtained experimental results confirm the superior performance of the proposed method over MFCC and outperform it in different noisy environments. The evaluation results based on the recognition rate accuracy show that both MFCC and the proposed features extractor have perfects performances in low-noise environments when Signal per Noise Ratio SNR is greater than 35 dB (practically 100% in all cases). But when the SNR of test signal changed from 0 to 40 dB, the average accuracy of the MFCCs methods is only 52.14%, while the proposed GFCCs features extractors associated to VAD and CMN techniques still achieves an average accuracy of 57.22%.
关键词:Cepstral Mean Normalisation (CMN); Gammatone Frequency Cepstral Coefficients (GFCC); Gaussian Mixture Models (GMMs); Mel Frequency Cepstral Coefficients (MFCC); Robust speaker identification; Voice Activity Detector (VAD); White Gaussian Noise (WGN)