期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2013
卷号:1
期号:8
出版社:S&S Publications
摘要:Classification is the most important technique in data mining. A Decision tree is the most importantclassification technique in machine learning and data mining. Data measurement errors are common in any datacollection process, particularly when the training datasets contain numerical attributes. Values of numerical attributescontain data measurement errors in many training data sets. We extend certain or traditional or classical decision treebuilding algorithms to handle training data sets with numerical attributes containing measurement errors. We havediscovered that the classification accuracy of a certain or classical or traditional decision tree classifier can be muchimproved if the data measurement errors in the values of numerical (or continuous) attributes in the training data setsare properly controlled (corrected or handled) appropriately. The present study proposes a new algorithm for decisiontree classifier construction. This new algorithm is named as Interval Decision Tree (IDT) classifier construction. IDTclassifiers are more accurate and efficient than certain or traditional decision tree classifiers. An interval is constructedfor each value of each attribute in the training data set and within the interval the best error corrected value isapproximated and then entropy is calculated. Extensive experiments have been conducted which show that the resultingIDT classifiers are more accurate than certain or traditional or classical decision tree classifiers.
关键词:error corrected interval values of the numerical attributes in the training data sets; measurement errors in the;values of numerical attributes in the training data sets; training data sets containing numerical attributes; training data;sets; decision tree; classification.