期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2012
卷号:2
期号:4
出版社:S.S. Mishra
摘要:Databases are rich with hidden information that can be used for intelligent decision making. Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends. Such analysis can help provide us with a better understanding of the data at large. Whereas classification predicts categorical (discrete, unordered) labels, prediction models continuous valued functions. The data analysis task is classification; Decision tree induction is the learning of decision trees from class-labeled training Tuples. A decision tree is a flowchart-like tree structure. The individual tuples making up the training set are referred to as training tuples and are selected from the database under analysis. Training data are analyzed by a classification algorithm. Many classification methods have been proposed by researchers in machine learning, pattern recognition, and statistics. Most algorithms are memory resident, typically assuming a small data size. In this paper we describe a basic algorithm for learning decision trees called Decision Tree Induction. During tree construction, attribute selection measures are used to select the attribute that best partitions the tuples into distinct classes. Popular measures of attribute selection are given. When decision trees are built, many of the branches may reflect noise or outliers in the training data. Tree pruning attempts to identify and remove such branches, with the goal of improving classification accuracy on unseen data. Tree pruning and Scalability issues for the induction of decision trees from large databases are discussed
关键词:Classification; Decision Tree Induction; Data partitions; Information gain; Gain ratio; Gini index and Tree ;Pruning.