Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models
- PMID: 28056835
- PMCID: PMC5217317
- DOI: 10.1186/s12874-016-0277-1
Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models
Abstract
Background: Clinical prediction models (CPMs) are increasingly deployed to support healthcare decisions but they are derived inconsistently, in part due to limited data. An emerging alternative is to aggregate existing CPMs developed for similar settings and outcomes. This simulation study aimed to investigate the impact of between-population-heterogeneity and sample size on aggregating existing CPMs in a defined population, compared with developing a model de novo.
Methods: Simulations were designed to mimic a scenario in which multiple CPMs for a binary outcome had been derived in distinct, heterogeneous populations, with potentially different predictors available in each. We then generated a new 'local' population and compared the performance of CPMs developed for this population by aggregation, using stacked regression, principal component analysis or partial least squares, with redevelopment from scratch using backwards selection and penalised regression.
Results: While redevelopment approaches resulted in models that were miscalibrated for local datasets of less than 500 observations, model aggregation methods were well calibrated across all simulation scenarios. When the size of local data was less than 1000 observations and between-population-heterogeneity was small, aggregating existing CPMs gave better discrimination and had the lowest mean square error in the predicted risks compared with deriving a new model. Conversely, given greater than 1000 observations and significant between-population-heterogeneity, then redevelopment outperformed the aggregation approaches. In all other scenarios, both aggregation and de novo derivation resulted in similar predictive performance.
Conclusion: This study demonstrates a pragmatic approach to contextualising CPMs to defined populations. When aiming to develop models in defined populations, modellers should consider existing CPMs, with aggregation approaches being a suitable modelling strategy particularly with sparse data on the local population.
Keywords: Clinical prediction models; Computer simulation; Contextual heterogeneity; Model aggregation; Validation.
Figures
Similar articles
-
A multiple-model generalisation of updating clinical prediction models.Stat Med. 2018 Apr 15;37(8):1343-1358. doi: 10.1002/sim.7586. Epub 2017 Dec 18. Stat Med. 2018. PMID: 29250812 Free PMC article.
-
Clinical prediction models to predict the risk of multiple binary outcomes: a comparison of approaches.Stat Med. 2021 Jan 30;40(2):498-517. doi: 10.1002/sim.8787. Epub 2020 Oct 26. Stat Med. 2021. PMID: 33107066
-
Meta-analysis and aggregation of multiple published prediction models.Stat Med. 2014 Jun 30;33(14):2341-62. doi: 10.1002/sim.6080. Epub 2014 Jan 14. Stat Med. 2014. PMID: 24752993
-
Missing data was handled inconsistently in UK prediction models: a review of method used.J Clin Epidemiol. 2021 Dec;140:149-158. doi: 10.1016/j.jclinepi.2021.09.008. Epub 2021 Sep 11. J Clin Epidemiol. 2021. PMID: 34520847 Review.
-
Clinical Prediction Models for Cardiovascular Disease: Tufts Predictive Analytics and Comparative Effectiveness Clinical Prediction Model Database.Circ Cardiovasc Qual Outcomes. 2015 Jul;8(4):368-75. doi: 10.1161/CIRCOUTCOMES.115.001693. Epub 2015 Jul 7. Circ Cardiovasc Qual Outcomes. 2015. PMID: 26152680 Free PMC article. Review.
Cited by
-
Air pollution and age-dependent changes in emotional behavior across early adolescence in the U.S.Environ Res. 2024 Jan 1;240(Pt 1):117390. doi: 10.1016/j.envres.2023.117390. Epub 2023 Oct 21. Environ Res. 2024. PMID: 37866541
-
Optimizing clinical nutrition research: the role of adaptive and pragmatic trials.Eur J Clin Nutr. 2023 Dec;77(12):1130-1142. doi: 10.1038/s41430-023-01330-7. Epub 2023 Sep 15. Eur J Clin Nutr. 2023. PMID: 37715007 Review.
-
Futility monitoring for randomized clinical trials with non-proportional hazards: An optimal conditional power approach.Clin Trials. 2023 Dec;20(6):603-612. doi: 10.1177/17407745231181908. Epub 2023 Jun 27. Clin Trials. 2023. PMID: 37366172 Free PMC article.
-
Machine learning for the life-time risk prediction of Alzheimer's disease: a systematic review.Brain Commun. 2021 Oct 21;3(4):fcab246. doi: 10.1093/braincomms/fcab246. eCollection 2021. Brain Commun. 2021. PMID: 34805994 Free PMC article. Review.
-
Continual updating and monitoring of clinical prediction models: time for dynamic prediction systems?Diagn Progn Res. 2021 Jan 11;5(1):1. doi: 10.1186/s41512-020-00090-3. Diagn Progn Res. 2021. PMID: 33431065 Free PMC article.
References
-
- Damen JAAG, Hooft L, Schuit E, Debray TPA, Collins GS, Tzoulaki I, Lassale CM, Siontis GCM, Chiocchia V, Roberts C, Schlüssel MM, Gerry S, Black JA, Heus P, van der Schouw YT, Peelen LM, Moons KGM. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416. doi: 10.1136/bmj.i2416. - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
