文章基本信息

标题：Predictor augmentation in random forests
作者：Ruo Xu ; Dan Nettleton ; Daniel J. Nordman 等
期刊名称：Statistics and Its Interface
印刷版ISSN：1938-7989
电子版ISSN：1938-7997
出版年度：2014
卷号：7
期号：2
页码：177-186
DOI：10.4310/SII.2014.v7.n2.a3
出版社：International Press
摘要：Random forest (RF) methodology is an increasingly popular nonparametric methodology for prediction in both regression and classification problems. We describe a behavior of random forests (RFs) that may be unknown and surprising to many initial users of the methodology: out-of-sample prediction by RFs can be sometimes improved by augmenting the dataset with a new explanatory variable, independent of all variables in the original dataset. We explain this phenomenon with a simulated example, and show how independent variable augmentation can help RFs to decreases prediction variance and improve prediction performance in some cases. We also give real data examples for illustration, argue that this phenomenon is closely connected with overfitting, and suggest potential research for improving RFs.
关键词：classification; machine learning; prediction; regression