期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2020
卷号:11
期号:12
页码:738-746
DOI:10.14569/IJACSA.2020.0111284
出版社:Science and Information Society (SAI)
摘要:Human action recognition has transformed from a video processing problem into multi modal machine learning problem. The objective of this work is to perform multi modal human action recognition on an ensemble hybrid network of CNN and LSTM layers. The proposed CNN - LSTM ensemble network is a 2 - stream framework with one ensemble stream learning RGB sequences and the other depth. This proposed framework can learn both temporal and spatial dynamics in both RGB and depth modal action data. The hybrid network is found to be receptive towards both spatial and temporal fields because of the hierarchical structure of CNNs and LSTMs. Finally, to test our proposed model, we used our own BVCAction3D and three RGB D benchmark action datasets. The experiments were conducted on all the datasets using the proposed framework and was found to be effective when compared to similar deep learning architectures.
关键词:Human action recogniiton; RGB D video data; convolutional neural networks; long short-term memory