期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2020
卷号:11
期号:4
DOI:10.14569/IJACSA.2020.0110469
出版社:Science and Information Society (SAI)
摘要:Over the last years, many researchers have engaged in improving accuracies on Automatic Speech Recognition (ASR) task by using deep learning. In state-of-the-art speech recognizers, both Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) based Reccurent Neural Network (RNN) have achieved improved performances compared to Convolutional Neural Network (CNN) and Deep Neural Network (DNN). Due to the strong complementarity of CNN, LSTM-RNN and DNN, they may be combined in one architecture called Convolutional Long Short-Term Memory, Deep Neural Network (CLDNN). Similarly we propose to combine CNN, GRU-RNN and DNN in a single deep architecture called Convolutional Gated Recurrent Unit, Deep Neural Network (CGDNN). In this paper, we present our experiments for phoneme recognition task tested on TIMIT data set. A phone error rate of 15.72% has been reached using the proposed CGDNN model. The achieved result confirms the superiority of CGDNN over all their baselines networks used alone and also over the CLDNN architecture.
关键词:Automatic speech recognition; deep learning; phoneme recognition; convolutional neural network; long short-term memory; gated recurrent unit; deep neural network; recurrent neural network; CLDNN; CGDNN; TIMIT