A regionalized national universal kriging model using Partial Least Squares regression for estimating annual PM2.5 concentrations in epidemiology

Atmos Environ (1994). 2013 Aug 1:75:383-392. doi: 10.1016/j.atmosenv.2013.04.015.


Many cohort studies in environmental epidemiology require accurate modeling and prediction of fine scale spatial variation in ambient air quality across the U.S. This modeling requires the use of small spatial scale geographic or "land use" regression covariates and some degree of spatial smoothing. Furthermore, the details of the prediction of air quality by land use regression and the spatial variation in ambient air quality not explained by this regression should be allowed to vary across the continent due to the large scale heterogeneity in topography, climate, and sources of air pollution. This paper introduces a regionalized national universal kriging model for annual average fine particulate matter (PM2.5) monitoring data across the U.S. To take full advantage of an extensive database of land use covariates we chose to use the method of Partial Least Squares, rather than variable selection, for the regression component of the model (the "universal" in "universal kriging") with regression coefficients and residual variogram models allowed to vary across three regions defined as West Coast, Mountain West, and East. We demonstrate a very high level of cross-validated accuracy of prediction with an overall R2 of 0.88 and well-calibrated predictive intervals. In accord with the spatially varying characteristics of PM2.5 on a national scale and differing kriging smoothness parameters, the accuracy of the prediction varies by region with predictive intervals being notably wider in the West Coast and Mountain West in contrast to the East.

Keywords: Ambient air quality; Land use regression; National air quality model; Partial Least Squares; Particulate matter; Universal kriging.