摘要:Preserving the air quality in urban areas is crucial for the health of thepopulation as well as for the environment. The availability of large volumes ofmeasurement data on the concentrations of air pollutants enables their analysis andmodelling to establish trends and dependencies in order to forecast and preventfuture pollution. This study proposes a new approach for modelling air pollutantsdata using the powerful machine learning method Random Forest (RF) and Auto-Regressive Integrated Moving Average (ARIMA) methodology. Initially, a RF modelof the pollutant is built and analysed in relation to the meteorological variables. Thismodel is then corrected through subsequent modelling of its residuals using theunivariate ARIMA. The approach is demonstrated for hourly data on seven airpollutants (O 3 , NOx, NO, NO 2 , CO, SO 2 , PM 10 ) in the town of Dimitrovgrad,Bulgaria over 9 years and 3 months. Six meteorological and three time variables areused as predictors. High-performance models are obtained explaining the data withR 2 = 90%-98%.
关键词:Machine learning; Random Forest; Autoregressive integrated moving average; error correction; time series; forecasting.