文章基本信息

标题：Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions
其他标题：Veri Madenciliği Sınıflandırma Algoritmalarının Farklı Koşullar için Eğitsel Bir Veride Karşılaştırılması
本地全文：下载
作者：İlhan KOYUNCU ; Selahattin GELBAL
期刊名称：Journal of Measurement and Evaluation in Education and Psychology
电子版ISSN：1309-6575
出版年度：2020
卷号：11
期号：4
页码：325-345
DOI：10.21031/epod.696664
语种：Turkish
出版社：EPODDER
摘要：The purpose of this study was to examine the performance of Naive Bayes, k-nearest neighborhood, neural networks, and logistic regression analysis in terms of sample size and test data rate in classifying students according to their mathematics performance. The target population was 62728 students in the 15-year-old group who were participated in the Programme for International Student Assessment (PISA) in 2012 from The Organisation for Economic Co-operation and Development (OECD) countries. The performance of each algorithm was tested by using 11%, 22%, 33%, 44% and 55% of each dataset for small (500 students), medium (1000 students) and large (5000 students) sample sizes. 100 replications were performed for each analysis. As the evaluation criteria, accuracy rates, RMSE values, and total elapsed time were used. RMSE values for each algorithm were statistically compared by using Friedman and Wilcoxon tests. The results revealed that while the classification performance of the methods increased as the sample size increased, the increase of training data ratio had different effects on the performance of the algorithms. The Naive Bayes showed high performance even in small samples, performed the analyzes very quickly, and was not affected by the change in the training data ratio. Logistic regression analysis was the most effective method in large samples but had a poor performance in small samples. While neural networks showed a similar tendency, its overall performance was lower than Naive Bayes and logistic regression. The lowest performances in all conditions were obtained by the k-nearest neighborhood algorithm.
关键词：Artificial neural networks;educational data mining;k-nearest neighborhood;logistic regression;naive Bayes