首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:Search for the smallest random forest
  • 本地全文:下载
  • 作者:Minghui Wang ; Heping Zhang
  • 期刊名称:Statistics and Its Interface
  • 印刷版ISSN:1938-7989
  • 电子版ISSN:1938-7997
  • 出版年度:2009
  • 卷号:2
  • 期号:3
  • 页码:381-388
  • DOI:10.4310/SII.2009.v2.n3.a11
  • 出版社:International Press
  • 摘要:Random forests have emerged as one of the most commonly used nonparametric statistical methods in many scientific areas, particularly in analysis of high throughput genomic data. A general practice in using random forests is to generate a sufficiently large number of trees, although it is subjective as to how large is sufficient. Furthermore, random forests are viewed as “black-box” because of its sheer size. In this work, we address a fundamental issue in the use of random forests: how large does a random forest have to be? To this end, we propose a specific method to find a sub-forest (e.g., in a single digit number of trees) that can achieve the prediction accuracy of a large random forest (in the order of thousands of trees). We tested it on extensive simulation studies and a real study on prognosis of breast cancer. The results show that such sub-forests usually exist and most of them are very small, suggesting they are actually the “representatives” of the whole random forests. We conclude that the sub-forests are indeed the core of a random forest. Thus it is not necessary to use the whole forest for satisfying prediction performance. Also, by reducing the size of a random forest to a manageable size, the random forest is no longer a black-box.
  • 关键词:random forest; classification; smallest forest
国家哲学社会科学文献中心版权所有