期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2020
卷号:117
期号:35
页码:21175-21184
DOI:10.1073/pnas.1921562117
出版社:The National Academy of Sciences of the United States of America
摘要:A method for decision tree induction is presented. Given a set of predictor variables x = ( x 1 , x 2 , ⋅ ⋅ ⋅ , x p ) and two outcome variables y and z associated with each x, the goal is to identify those values of x for which the respective distributions of y x and z x , or selected properties of those distributions such as means or quantiles, are most different. Contrast trees provide a lack-of-fit measure for statistical models of such statistics, or for the complete conditional distribution p y ( y x ) , as a function of x. They are easily interpreted and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method. A corresponding contrast-boosting strategy is described for remedying any uncovered errors, thereby producing potentially more accurate predictions. This leads to a distribution-boosting strategy for directly estimating the full conditional distribution of y at each x under no assumptions concerning its shape, form, or parametric representation.