Historical PM2.5 data are essential for assessing the health effects of air pollution exposure across the life course or early life. However, a lack of high-quality data sources, such as satellite-based aerosol optical depth before 2000, has resulted in a gap in spatiotemporally resolved PM2.5 data for historical periods. Taking the United Kingdom as an example, we leveraged the light gradient boosting model to capture the spatiotemporal association between PM2.5 concentrations and multi-source geospatial predictors. Augmented PM2.5 from PM10 measurements expanded the spatiotemporal representativeness of the ground measurements. Observations before and after 2009 were used to train and test the models, respectively. Our model showed fair prediction accuracy from 2010 to 2019 [the ranges of coefficients of determination (R2) for the grid-based cross-validation are 0.71-0.85] and commendable back extrapolation performance from 1998 to 2009 (the ranges of R2 for the independent external testing are 0.32-0.65) at the daily level. The pollution episodes in the 1980s and pollution levels in the 1990s were also reproduced by our model. The 4-decade PM2.5 estimates demonstrated that most regions in England witnessed significant downward trends in PM2.5 pollution. The methods developed in this study are generalizable to other data-rich regions for historical air pollution exposure assessment.
Keywords: LightGBM; PM2.5; SHAP; U.K.; back extrapolation; exposure analysis; spatiotemporal patterns.