摘要:Highlights•Various statistical learning techniques are compared in country and global models.•The TROPOMI instrument measurements are evaluated in the models.•The daytime and nighttime NO2are modelled separately for exposure assessment.•The ensemble tree-based methods are promising for global mapping.•The spatial prediction patterns need to be considered in the validation process.AbstractBackgroundIn countries where air pollution stations are unavailable or scarce, station measurements from other countries and atmospheric remote sensing could jointly provide information to estimate ambient air quality at a sufficiently fine resolution to study the relationship between air pollution exposure and health. Predicting NO2concentration globally with sufficient spatial and temporal resolution and accuracy for health studies is, however, not a trivial task. Challenges are data deficiency, in terms of NO2measurements and NO2predictors, and the development of a statistical model that can typify the regional and continental differences, such as traffic regulations, energy sources, and local weather.ObjectiveWe investigated the feasibility of mapping daytime and nighttime NO2globally at a high spatial resolution (25 m), by including TROPOMI (TROPOspheric Monitoring Instrument) data and comparing various statistical learning techniques.MethodWe separated daytime (7:00 am - 9:59 pm) and nighttime (10:00 pm - 6:59 am) based on the local times. To study if one should build models for each country separately, national models in 4 selected countries (the US, China, Germany, Spain) were developed. We build the models for 2017 and used 3636 stations. Seven statistical learning techniques were applied and the impact of the predictors, model fitting, and predicting accuracy was compared between different techniques, national models, national and global models, and models with and without including the NO2vertical column density retrieved from TROPOMI.Result and conclusionThe ensemble tree-based methods obtained higher accuracy compared to the linear regression-based methods in national and global models. The global tree-based methods obtained similar accuracy to national models. Different spatial prediction patterns are observed even when the prediction accuracy is very similar. Separating between day and night can be important for more accurate air pollution exposure assessment. The TROPOMI variable is ranked as one of the most important variables in the statistical learning techniques but adding it to global models that contain other precedent remote sensing products does not improve the prediction accuracy.