摘要:We show the connection between symbolic data analysis (SDA) and certain algorithms of supervised learning for the prediction of a continuous or categorical outcome. In the context of SDA, we had previously developed a tree-growing algorithm which allowed us to handle imprecise data and to construct classical prediction trees from such data. Later we went back to tree-growing for classical (numerical) data, and proposed the notion of probabilistic or soft node, that is a node representing a decision of the type: ‘go left with probability p and go right with probability 1-p’. Such a tree-shaped predictor describes the conditional predictive distribution of the outcome as a mixture of distributions, with mixing coefficients which are functions of certain predictor variables chosen by the algorithms guided by the data. We describe an EM approach to the estimation of the predictive model parameters. The method is evaluated by simulation and real data analyses. In conclusion, we discuss the advantages and the limitations of the tree with soft nodes in comparison with conventional prediction trees
关键词:supervised learning, prediction, trees with soft nodes, symbolic data analysis, imprecise data.