文章基本信息

标题：Combining Evidence from Auditory, Instantaneous Frequency and Random Forest for Anti-Noise Speech Recognition
本地全文：下载
作者：Kun Liao
期刊名称：Computer Science & Information Technology
电子版ISSN：2231-5403
出版年度：2021
卷号：11
期号：22
语种：English
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：Due to the shortcomings of acoustic feature parameters in speech signals, and the limitations of existing acoustic features in characterizing the integrity of the speech information, This paper proposes a method for speech recognition combining cochlear feature and random forest. Environmental noise can pose a threat to the stable operation of current speech recognition systems. It is therefore essential to develop robust systems that are able to identify speech under low signal-to-noise ratio. In this paper, we propose a method of speech recognition combining spectral subtraction, auditory and energy features extraction. This method first extract novel auditory features based on cochlear filter cepstral coefficients (CFCC) and instantaneous frequency (IF), i.e., CFCCIF. Spectral subtraction is then introduced into the front end of feature extraction, and the extracted feature is called enhanced auditory features (EAF). An energy feature Teager energy operator (TEO) is also extracted, the combination of them is known as a fusion feature. Linear discriminate analysis (LDA) is then applied to feature selection and optimization of the fusion feature. Finally, random forest (RF) is used as the classifier in a non-specific persons, isolated words, and small-vocabulary speech recognition system. On the Korean isolated words database, the proposed features (i.e., EAF) after fusion with Teager energy features have shown strong robustness in the nosiy situation. Our experiments show that the optimization feature achieved in a speech recognition task display a high recognition rate and excellent anti-noise performance.
关键词：Cochlear filter cepstral coefficients;Teager energy features;Linear discriminate analysis;Random forest;speech recognition