首页    期刊浏览 2024年07月18日 星期四
登录注册

文章基本信息

  • 标题:Framework for making better predictions by directly estimating variables’ predictivity
  • 本地全文:下载
  • 作者:Adeline Lo ; Herman Chernoff ; Tian Zheng
  • 期刊名称:Proceedings of the National Academy of Sciences
  • 印刷版ISSN:0027-8424
  • 电子版ISSN:1091-6490
  • 出版年度:2016
  • 卷号:113
  • 期号:50
  • 页码:14277-14282
  • DOI:10.1073/pnas.1616647113
  • 语种:English
  • 出版社:The National Academy of Sciences of the United States of America
  • 摘要:SignificanceGood prediction, especially in the context of big data, is important. Common approaches to prediction include using a significance-based criterion for evaluating variables to use in models and evaluating variables and models simultaneously for prediction using cross-validation or independent test data. The first approach can lead to choosing less-predictive variables, because significance does not imply predictivity. The second approach can be improved through considering a variables predictivity as a parameter to be estimated. The literature currently lacks measures that do this. We suggest a measure that evaluates variables abilities to predict, the [IMG]f1.gif" ALT="Formula" BORDER="0">-score. The [IMG]f1.gif" ALT="Formula" BORDER="0">-score is effective in differentiating between noisy and predictive variables in big data and can be related to a lower bound for the correct prediction rate. We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [IMG]f1.gif" ALT="Formula" BORDER="0">-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [IMG]f1.gif" ALT="Formula" BORDER="0">-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [IMG]f1.gif" ALT="Formula" BORDER="0">-score on real data to demonstrate the statistics predictive performance on sample data. We conjecture that using the partition retention and [IMG]f1.gif" ALT="Formula" BORDER="0">-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.
  • 关键词:prediction ; variable selection ; high-dimensional data ; predictivity
国家哲学社会科学文献中心版权所有