Background: Data on causes of death by age and sex are a critical input into health decision-making. Priority setting in public health should be informed not only by the current magnitude of health problems but by trends in them. However, cause of death data are often not available or are subject to substantial problems of comparability. We propose five general principles for cause of death model development, validation, and reporting.
Methods: We detail a specific implementation of these principles that is embodied in an analytical tool - the Cause of Death Ensemble model (CODEm) - which explores a large variety of possible models to estimate trends in causes of death. Possible models are identified using a covariate selection algorithm that yields many plausible combinations of covariates, which are then run through four model classes. The model classes include mixed effects linear models and spatial-temporal Gaussian Process Regression models for cause fractions and death rates. All models for each cause of death are then assessed using out-of-sample predictive validity and combined into an ensemble with optimal out-of-sample predictive performance.
Results: Ensemble models for cause of death estimation outperform any single component model in tests of root mean square error, frequency of predicting correct temporal trends, and achieving 95% coverage of the prediction interval. We present detailed results for CODEm applied to maternal mortality and summary results for several other causes of death, including cardiovascular disease and several cancers.
Conclusions: CODEm produces better estimates of cause of death trends than previous methods and is less susceptible to bias in model specification. We demonstrate the utility of CODEm for the estimation of several major causes of death.