首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Improved Decision Tree Induction Algorithm with Feature Selection, Cross Validation, ModelComplexity and Reduced Error Pruning
  • 本地全文:下载
  • 作者:A. S. Galathiya ; A. P. Ganatra ; C. K. Bhensdadia
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2012
  • 卷号:3
  • 期号:2
  • 页码:3427-3431
  • 出版社:TechScience Publications
  • 摘要:Data mining is the process of finding new patterns. Classification is the technique of generalizing known structure to apply to new data. Classification using a decision tree is performed by routing from the root node until arriving at a leaf node. To model classification process, decision tree is used. Decision can handle both continuous and categorical data. In this research work, Comparison is made between ID3, C4.5 and C5.0. Among these classifiers C5.0 gives more accurate and efficient output with comparatively high speed. Memory usage to store the ruleset in case of the C5.0 classifier is less as it generates smaller decision tree. This research work supports high accuracy, good speed and low memory usage as proposed system is using C5.0 as the base classifier. The classification process here has low memory usage compare to other techniques because it generates fewer rules. Accuracy is high as error rate is low on unseen cases. And it is fast due to generating pruned trees. This research work proposed C5.0 classifier that performs feature selection, cross validation, reduced error pruning and model complexity for original C5.0 in order to reduce the optimization of error ratio. In this paper, feature selection, cross validation, reduced error pruning and model complexity are the techniques which are described as those are used in the proposed system. Feature selection is used for dimensionality reduction. It reduces the attribute space of a feature set. It is to remove irrelevant data attributes. One way to get a more reliable estimate of predictive is by cross- validation. By increasing the model complexity, accuracy of the classification is increases. By applying reduced error pruning technique, overfitting problem of the decision tree is solved. Using this proposed system; Accuracy will be gained about 1 to 3 %. The classification error rate is reduced compare to the existing system and within less time the decision tree is constructed.
  • 关键词:REP; Decision Tree induction; C5 classifier;KNN; SVM"
国家哲学社会科学文献中心版权所有