期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2018
卷号:96
期号:18
出版社:Journal of Theoretical and Applied
摘要:Machine learning classifiers are used to distinguish healthy individuals from patients with Parkinson�s disease through the use of a dataset of voice measurements based on patient speech recordings. Feature selection based on information theory is used in many data mining and machine learning applications. Mutual information is used on the Parkinson disease dataset to select a subset of relevant features that contribute the most in the decision making process. In conjunction with Mutual Information, the area under curve (AUC) is applied for feature selection, and features are eliminated by majority voting. In this paper, five classifiers are used to classify Parkinson�s disease: Multilayer Feedforward Artificial Neural Network, k-Nearest Neighbor (kNN), Support Vector Machines, Na�ve Bayes, and k-Means. The dataset is preprocessed prior to the classification, and the classifiers are trained using the k-fold cross validation evaluation model. The performance of the classifiers is evaluated based on the accuracy and the area under curve before and after the feature selection. The results are promising, particularly for the kNN classifier; k-Means presents the worst performance.
关键词:Machine Learning; Feature Selection; Mutual Information; Area Under Curve; Parkinsons Disease