期刊名称:Advances in Electrical and Computer Engineering
印刷版ISSN:1582-7445
电子版ISSN:1844-7600
出版年度:2014
卷号:14
期号:1
页码:65-68
DOI:10.4316/AECE.2014.01010
出版社:Universitatea "Stefan cel Mare" Suceava
摘要:There is a growing interest nowadays to process large amounts of data using the well-known decision-tree learning algorithms. Building a decision tree as fast as possible against a large dataset without substantial decrease in accuracy and using as little memory as possible is essential. In this paper we present an improved C4.5 algorithm that uses a compression mechanism to store the training and test data in memory. We also present a very fast tree pruning algorithm. Our experiments show that presented algorithms perform better than C5.0 in terms of speed and classification accuracy in most cases at the expense of tree size - the resulting trees are larger than the ones produced by C5.0. The data compression and pruning algorithms can be easily parallelized in order to achieve further speedup.