摘要:Aimed at the problem of huge computation, large tree size and over-fitting of the testing data for multivariate decision tree (MDT) algorithms, we proposed a novel rough set-based multivariate decision trees (RSMDT) method. In this paper, the positive region degree of condition attributes with respect to decision attributes in rough set theory is used for selecting attributes in multivariate tests. And a new concept of extended generalization of one equivalence relation corresponding to another one is introduced and used for construction of multivariate tests. We experimentally test RSMDT algorithm in terms of classification accuracy, tree size and computing time, using the whole 36 UCI Machine Learning Repository data sets selected by Weka platform, and compare it with C4.5, classification and regression trees (CART), classification and regression trees with linear combinations (CART-LC), Oblique Classifier 1 (OC1), Quick Unbiased Efficient Statistical Trees (QUEST). The experimental results indicate that RSMDT algorithm significantly outperforms the comparison classification algorithms with improved classification accuracy, relatively small tree size, and shorter computing time.
关键词:decision tree;classification;multivariate decision trees (MDT);rough set;positive region;generalization