文章基本信息

标题：Deep neural network based two-stage Indian language identification system using glottal closure instants as anchor points
本地全文：下载
作者：Chuya China Bhanja ; Mohammad Azharuddin Laskar ; Rabul Hussain Laskar 等
期刊名称：Journal of King Saud University @?C Computer and Information Sciences
印刷版ISSN：1319-1578
出版年度：2022
卷号：34
期号：4
页码：1439-1454
语种：English
出版社：Elsevier
摘要：This paper presents a two-stage Indian language identification (TS-LID) system which is made up of a tonal/non-tonal pre-classification and individual language identification modules. It studies the effectiveness of Mean Hilbert envelope coefficients (MHEC) and Mel-frequency cepstral coefficients (MFCCs), and their combinations with prosody in TS-LID context. Both glottal closure instants (GCIs)-based approaches and the block processing (BP) approach have been explored. It also explores different types of analysis units, such as whole utterance and syllable. Various state-of-art modeling techniques have been analyzed in this work. Experiments have been carried out for the NIT Silchar language database (NITS-LD) and OGI-Multilingual database (OGI-MLTS). The results suggest that at the pre-classification stage, for NITS-LD, the deep neural network (DNN) with syllable-level features, using GCI-based approaches, provides the highest accuracies of 90.6%, 85% and 81.3% for 30 s, 10 s and 3 s test data respectively. The GCI-based approaches outperform the BP method by as much as 7.5%, 6.2%, and 5.7%. The pre-classification module helps to improve the performance of the LID system by as much as 5.7%, 4.4% and 2.2% for 30 s, 10 s and 3 s test data respectively. The corresponding improvements for OGI-MLTS database are 7.4%, 6.8%, and 5%.