摘要:Gesture recognition utilizes deep learning network model to automatically extract deep features of data; however, traditional machine learning algorithms rely on manual feature extraction and poor model generalization ability. In this paper, a multimodal gesture recognition algorithm based on convolutional long-term memory network is proposed. First, a convolutional neural network (CNN) is employed to automatically extract the deeply hidden features of multimodal gesture data. Then, a time series model is constructed using a long short-term memory (LSTM) network to learn the long-term dependence of multimodal gesture features on the time series. On this basis, the classification of multimodal gestures is realized by the SoftMax classifier. Finally, the method is experimented and evaluated on two dynamic gesture datasets, VIVA and NVGesture. Experimental results indicate that the accuracy rates of the proposed method on the VIVA and NVGesture datasets are 92.55% and 87.38%, respectively, and its recognition accuracy and convergence performance are better than those of other comparison algorithms.