首页    期刊浏览 2025年02月21日 星期五
登录注册

文章基本信息

  • 标题:Using Boosted Regression Trees and Remotely Sensed Data to Drive Decision-Making
  • 本地全文:下载
  • 作者:Brigitte Colin ; Samuel Clifford ; Paul Wu
  • 期刊名称:Open Journal of Statistics
  • 印刷版ISSN:2161-718X
  • 电子版ISSN:2161-7198
  • 出版年度:2017
  • 卷号:07
  • 期号:05
  • 页码:859-875
  • DOI:10.4236/ojs.2017.75061
  • 语种:English
  • 出版社:Scientific Research Publishing
  • 摘要:Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT ha s the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) show s that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.
  • 关键词:Boosted Regression Trees;Remotely Sensed Data;Big Data Modelling Approach;Missing Data
国家哲学社会科学文献中心版权所有