Background: Increasingly ensemble learning-based spatiotemporal models are being used to estimate residential air pollution exposures in epidemiological studies. While these machine learning models typically have improved performance, they suffer from exposure measurement error that is inherent in all models. Our objective is to develop a framework to formally assess shared, multiplicative measurement error (SMME) in our previously published three-stage, ensemble learning-based nitrogen oxides (NOx) model to identify its spatial and temporal patterns and predictors.
Methods: By treating the ensembles as an external dosimetry system, we quantified shared and unshared, multiplicative and additive (SUMA) measurement error components in our exposure model. We used generalized additive models (GAMs) with a smooth term for location to identify geographic locations with significantly elevated SMME and explain their spatial and temporal determinants.
Results: We found evidence of significant shared and unshared multiplicative error (p < 0.0001) in our ensemble-learning based spatiotemporal NOx model predictions. Unshared multiplicative error was 26 times larger than SMME. We observed significant geographic (p < 0.0001) and temporal variation in SMME with the majority (43%) of predictions with elevated SMME occurring in the earliest time-period (1992-2000). Densely populated urban prediction regions with complex air pollution sources generally exhibited highest odds of elevated SMME.
Conclusions: We developed a novel statistical framework to formally evaluate the magnitude and drivers of SMME in ensemble learning-based exposure models. Our framework can be used to inform building future improved exposure models.
Copyright © 2019 The Authors. Published by Elsevier Ltd.. All rights reserved.