文章基本信息

标题：ENHANCING THE PERFORMANCE OF DIABETES PREDICTION USING TUNING OF HYPERPARAMETERS OF CLASSIFIERS ON IMBALANCED DATASET
本地全文：下载
作者：Subhash Chandra Gupta ; Durgesh Kumar Singh ; Noopur Goel 等
期刊名称：Indian Journal of Computer Science and Engineering
印刷版ISSN：2231-3850
电子版ISSN：0976-5166
出版年度：2021
卷号：12
期号：6
页码：1646-1662
DOI：10.21817/indjcse/2021/v12i6/211206049
语种：English
出版社：Engg Journals Publications
摘要：Background: The prediction ability of a classifier is important for a diabetes prediction model. The more correct prediction a classifier make, the better performance of the model will be. Although a number of researches has been done in this area, but still there are some scope to improve the performing capability of model. In this experimental work an effort is made to do it by applying these three method - identifying appropriate preprocessing act, perform oversampling to make balanced class dataset and tune the hyperparameter of classifiers to improve its performance. Methods: In research experiment, four different prediction model is built on preprocessed, oversampled balanced class datasets which are created from PIMA diabetes dataset by different preprocessing methods. Each model uses hypertuned classifiers KNN, SVM, DT and random forest to classify samples in diabetic and non-diabetic category. The obtained results are stored and analyzed, and the best model is selected by considering F1 score of classifiers of all prediction model. Results: The results obtained from these models show that the highest F1score of classifiers of each model on dataset D1, D2, D3 and D4 are 88.52 %, 88.79 %, 93.33 % and 95.23% respectively, and it is achieved by random forest classifier for every model. Conclusion: From the analysis of the results obtained from these models it is found that the best prediction model is based on dataset D4 which is created from the removal of outliers and rows having missing values during preprocessing.
关键词：Diabetes mellitus;imbalanced dataset;over-sampling;SMOTE method;random forest;support vector machine;KNN