An Ensemble Machine-Learning Model To Predict Historical PM2.5 Concentrations in China from Satellite Data

Qingyang Xiao; Howard H Chang; Guannan Geng; Yang Liu

doi:10.1021/acs.est.8b02917

An Ensemble Machine-Learning Model To Predict Historical PM_2.5 Concentrations in China from Satellite Data

Environ Sci Technol. 2018 Nov 20;52(22):13260-13269. doi: 10.1021/acs.est.8b02917. Epub 2018 Nov 1.

Authors

Qingyang Xiao, Howard H Chang, Guannan Geng, Yang Liu

PMID: 30354085
DOI: 10.1021/acs.est.8b02917

Abstract

The long satellite aerosol data record enables assessments of historical PM_2.5 level in regions where routine PM_2.5 monitoring began only recently. However, most previous models reported decreased prediction accuracy when predicting PM_2.5 levels outside the model-training period. In this study, we proposed an ensemble machine learning approach that provided reliable PM_2.5 hindcast capabilities. The missing satellite data were first filled by multiple imputation. Then the modeling domain, China, was divided into seven regions using a spatial clustering method to control for unobserved spatial heterogeneity. A set of machine learning models including random forest, generalized additive model, and extreme gradient boosting were trained in each region separately. Finally, a generalized additive ensemble model was developed to combine predictions from different algorithms. The ensemble prediction characterized the spatiotemporal distribution of daily PM_2.5 well with the cross-validation (CV) R² (RMSE) of 0.79 (21 μg/m³). The cluster-based subregion models outperformed national models and improved the CV R² by ∼0.05. Compared with previous studies, our model provided more accurate out-of-range predictions at the daily level ( R² = 0.58, RMSE = 29 μg/m³) and monthly level ( R² = 0.76, RMSE = 16 μg/m³). Our hindcast modeling system allows for the construction of unbiased historical PM_2.5 levels.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Air Pollutants*
China
Environmental Monitoring
Machine Learning
Particulate Matter*

Substances

Air Pollutants
Particulate Matter