期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:3
页码:5984
DOI:10.15680/IJIRCCE.2017.0503349
出版社:S&S Publications
摘要:Data Mining is a technique used in various domains to give mean- ing to the available data. Inclassification tree modeling the data is classified to make predictions about new data. Using old data to predict newdata has the danger of being too fitted on the old data. But that problem can be solved by pruning methods whichdegeneralizes the modelled tree. This paper describes the use of classification trees and shows two methods of pruningthem. An experiment has been set up using different kinds of classification tree algorithms with different pruningmethods to test the performance of the algorithms and pruning methods. This paper also analyzes data set properties tofind relations between them and the classification algorithms and pruning methods. Classification problems in highdimensional data with small number of observations are becoming more common especially in microarray data. Duringthe last two decades, lots of efficient classification models and feature selection (FS) algorithms have been proposed forhigher prediction accuracies. However, the result of an FS algorithm based on the prediction accuracy will be unstableover the variations in the training set, especially in high dimensional data. This paper proposes a new evaluationmeasure Q-statistic that incorporates the stability of the selected feature subset in addition to the prediction accuracy.
关键词:Q-static; Data Mining; Feature Selection (FS).