摘要:Background Long-term surface NO 2 data are essential for retrospective policy evaluation and chronic human exposure assessment. In the absence of NO 2 observations for Mainland China before 2013, training a model with 2013–2018 data to make predictions for 2005–2012 (back-extrapolation) could cause substantial estimation bias due to concept drift. Objective This study aims to correct the estimation bias in order to reconstruct the spatiotemporal distribution of daily surface NO 2 levels across China during 2005–2018. Methods On the basis of ground- and satellite-based data, we proposed the robust back-extrapolation with a random forest (RBE-RF) to simulate the surface NO 2 through intermediate modeling of the scaling factors. For comparison purposes, we also employed a random forest (Base-RF), as a representative of the commonly used approach, to directly model the surface NO 2 levels. Results The validation against Taiwan’s NO 2 observations during 2005–2012 showed that RBE-RF adequately corrected the substantial underestimation by Base-RF. The RMSE decreased from 10.1 to 8.2 µg/m 3 , 7.1 to 4.3 µg/m 3 , and 6.1 to 2.9 µg/m 3 in predicting daily, monthly, and annual levels, respectively. For North China with the most severe pollution, the population-weighted NO 2 ([NO 2 ] pw ) during 2005–2012 was estimated as 40.2 and 50.9 µg/m 3 by Base-RF and RBE-RF, respectively, i.e., 21.0% difference. While both models predicted that the national annual [NO 2 ] pw increased during 2005–2011 and then decreased, the interannual trends were underestimated by >50.2% by Base-RF relative to RBE-RF. During 2005–2018, the nationwide population that lived in the areas with NO 2 > 40 µg/m 3 were estimated as 259 and 460 million by Base-RF and RBE-RF, respectively. Conclusion With RBE-RF, we corrected the estimation bias in back-extrapolation and obtained a full-coverage dataset of daily surface NO 2 across China during 2005–2018, which is valuable for environmental management and epidemiological research.
关键词:Nitrogen dioxide ; Long term ; Back extrapolation ; Machine learning ; Concept drift ; Exposure assessment