期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2017
卷号:6
期号:1
页码:19-23
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Data Mining is becoming one of the leadingtechniques applicable in a variety of area. One such area ispredictive analysis in medical field. In this paper we investigatethe performance of about 15 data mining classificationalgorithms viz. Rnd Tree, Quinlan decision tree algorithm(C4.5), K-Nearest Neighbor algorithm etc., on a large datasetfrom the “Hepatitis dataset‟ (derived from the UCI MachineLearning Repository) that comprises of 20 attributes(including class) and 155 instances. Also we investigate on theimportance of feature selection and applied three featureselection algorithms namely Fisher filtering, Relief filtering,Step Disc and classified the dataset using 15 most commonclassifiers. The results of this study indicate the level ofaccuracy as well as the importance of all the instances indetecting the survival of a person in future. The classificationalgorithms BVM,CVM and Rnd Tree produced 100 percentaccuracy for classification of all the training data underbivalued classes. The study also revealed that the featureselection algorithms as mentioned above are not suitable forthis dataset for effective classification. The classificationalgorithm was also applied to verify it’s correctness inclassifying test data.
关键词:Data mining; classification; hepatitis; Naive;bayes; Multi Layer Perceptron; Random Forest; J48.