文章基本信息

标题：Using Boosted Regression Trees and Remotely Sensed Data to Drive Decision-Making
本地全文：下载
作者：Brigitte Colin ; Samuel Clifford ; Paul Wu 等
期刊名称：Open Journal of Statistics
印刷版ISSN：2161-718X
电子版ISSN：2161-7198
出版年度：2017
卷号：07
期号：05
页码：859-875
DOI：10.4236/ojs.2017.75061
语种：English
出版社：Scientific Research Publishing
摘要：Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT ha s the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) show s that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.
关键词：Boosted Regression Trees;Remotely Sensed Data;Big Data Modelling Approach;Missing Data