首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:ENHANCING THE PERFORMANCE OF DIABETES PREDICTION USING TUNING OF HYPERPARAMETERS OF CLASSIFIERS ON IMBALANCED DATASET
  • 本地全文:下载
  • 作者:Subhash Chandra Gupta ; Durgesh Kumar Singh ; Noopur Goel
  • 期刊名称:Indian Journal of Computer Science and Engineering
  • 印刷版ISSN:2231-3850
  • 电子版ISSN:0976-5166
  • 出版年度:2021
  • 卷号:12
  • 期号:6
  • 页码:1646-1662
  • DOI:10.21817/indjcse/2021/v12i6/211206049
  • 语种:English
  • 出版社:Engg Journals Publications
  • 摘要:Background: The prediction ability of a classifier is important for a diabetes prediction model. The more correct prediction a classifier make, the better performance of the model will be. Although a number of researches has been done in this area, but still there are some scope to improve the performing capability of model. In this experimental work an effort is made to do it by applying these three method - identifying appropriate preprocessing act, perform oversampling to make balanced class dataset and tune the hyperparameter of classifiers to improve its performance. Methods: In research experiment, four different prediction model is built on preprocessed, oversampled balanced class datasets which are created from PIMA diabetes dataset by different preprocessing methods. Each model uses hypertuned classifiers KNN, SVM, DT and random forest to classify samples in diabetic and non-diabetic category. The obtained results are stored and analyzed, and the best model is selected by considering F1 score of classifiers of all prediction model. Results: The results obtained from these models show that the highest F1score of classifiers of each model on dataset D1, D2, D3 and D4 are 88.52 %, 88.79 %, 93.33 % and 95.23% respectively, and it is achieved by random forest classifier for every model. Conclusion: From the analysis of the results obtained from these models it is found that the best prediction model is based on dataset D4 which is created from the removal of outliers and rows having missing values during preprocessing.
  • 关键词:Diabetes mellitus;imbalanced dataset;over-sampling;SMOTE method;random forest;support vector machine;KNN
国家哲学社会科学文献中心版权所有