文章基本信息

标题：FEATURE SELECTION AND CLASSIFICATION OF SPEECH DATASET FOR GENDER IDENTIFICATION: A MACHINE LEARNING APPROACH
本地全文：下载
作者：RIZWAN REHMAN ; KAUSTUVMONI BORDOLOI ; KANKANA DUTTA 等
期刊名称：Journal of Theoretical and Applied Information Technology
印刷版ISSN：1992-8645
电子版ISSN：1817-3195
出版年度：2020
卷号：98
期号：22
页码：3449-3459
出版社：Journal of Theoretical and Applied
摘要：In speech analysis, gender identification is one of the most complex tasks. Gender can be traced from the acoustic parameters like formants (F1, F2, F3, F4) or the pitch (F0). Therefore it is very important to identify which feature or features can classify the dataset efficiently in terms of a male and female speakers. This paper is an attempt to classify the dataset more accurately using fewer features i.e. among F0, F1, F2, F3, and F4. For the feature selection, the Fisher score algorithm is used to find out the most discriminative feature that can be used for the classification of the gender from the speech data set. Then to cross-validate the result obtained using the Fisher score algorithm we have applied the Tree-based algorithm. The results of both the algorithm comply with each other as F0 or pitch is the most distinctive feature among all with both the algorithms. Since the result of both the algorithm comply with each other we have then performed the classification by applying logistic regression, KNN classifier, SVM, and Decision tree algorithms. We have then evaluated and compared the accuracy of each of the features using these classification techniques. The finding of this study will provide the statistical means to identify the best feature for gender identification from the acoustic characteristic.
关键词：Feature Selection;Classification;Gender Identification;Statistical Methods