Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 6;17(1):1.
doi: 10.1186/s12874-016-0277-1.

Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models

Affiliations
Free PMC article

Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models

Glen P Martin et al. BMC Med Res Methodol. .
Free PMC article

Abstract

Background: Clinical prediction models (CPMs) are increasingly deployed to support healthcare decisions but they are derived inconsistently, in part due to limited data. An emerging alternative is to aggregate existing CPMs developed for similar settings and outcomes. This simulation study aimed to investigate the impact of between-population-heterogeneity and sample size on aggregating existing CPMs in a defined population, compared with developing a model de novo.

Methods: Simulations were designed to mimic a scenario in which multiple CPMs for a binary outcome had been derived in distinct, heterogeneous populations, with potentially different predictors available in each. We then generated a new 'local' population and compared the performance of CPMs developed for this population by aggregation, using stacked regression, principal component analysis or partial least squares, with redevelopment from scratch using backwards selection and penalised regression.

Results: While redevelopment approaches resulted in models that were miscalibrated for local datasets of less than 500 observations, model aggregation methods were well calibrated across all simulation scenarios. When the size of local data was less than 1000 observations and between-population-heterogeneity was small, aggregating existing CPMs gave better discrimination and had the lowest mean square error in the predicted risks compared with deriving a new model. Conversely, given greater than 1000 observations and significant between-population-heterogeneity, then redevelopment outperformed the aggregation approaches. In all other scenarios, both aggregation and de novo derivation resulted in similar predictive performance.

Conclusion: This study demonstrates a pragmatic approach to contextualising CPMs to defined populations. When aiming to develop models in defined populations, modellers should consider existing CPMs, with aggregation approaches being a suitable modelling strategy particularly with sparse data on the local population.

Keywords: Clinical prediction models; Computer simulation; Contextual heterogeneity; Model aggregation; Validation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Simulation Procedure: A pictorial representation of the simulation procedure for a given value of population heterogeneity, σ, and a given development sample size, n train. This process was then repeated across all combinations of σ and n train
Fig. 2
Fig. 2
Calibration Intercept: Calibration intercept in the validation set for SR, PCA and the two newly derived models across all simulation situations. The PLS results were nearly identical to SR/PCA and so are omitted for clarity. Note: σ = 1 was removed from the plot for clarity since the results quantitatively similar to σ = 0.75
Fig. 3
Fig. 3
Calibration Slope: Calibration slope in the validation set for SR, PCA and the two newly derived models across all simulation situations. The PLS results were nearly identical to SR/PCA and so are omitted for clarity. Note: σ = 1 was removed from the plot for clarity since the results quantitatively similar to σ = 0.75
Fig. 4
Fig. 4
Discrimination: Area under receiver operating characteristic curve (AUC) in the validation set for SR, PCA and the two newly derived models across all scenarios. The PLS results were nearly identical to SR/PCA and so are omitted for clarity. Note: σ = 1 was removed from the plot for clarity since the results quantitatively similar to σ = 0.75

Similar articles

Cited by

References

    1. Kappen TH, Vergouwe Y, van Klei WA, van Wolfswinkel L, Kalkman CJ, Moons KGM. Adaptation of Clinical Prediction Models for Application in Local Settings. Med Decis Mak. 2012;32:E1–E10. doi: 10.1177/0272989X12439755. - DOI - PMC - PubMed
    1. Damen JAAG, Hooft L, Schuit E, Debray TPA, Collins GS, Tzoulaki I, Lassale CM, Siontis GCM, Chiocchia V, Roberts C, Schlüssel MM, Gerry S, Black JA, Heus P, van der Schouw YT, Peelen LM, Moons KGM. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416. doi: 10.1136/bmj.i2416. - DOI - PMC - PubMed
    1. Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605. doi: 10.1136/bmj.b605. - DOI - PubMed
    1. Royston P, Moons KGM, Altman DG, Vergouwe Y. Prognosis and prognostic research: Developing a prognostic model. BMJ. 2009;338:b604. doi: 10.1136/bmj.b604. - DOI - PubMed
    1. Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ. 2009;338:b606. doi: 10.1136/bmj.b606. - DOI - PubMed

Publication types

LinkOut - more resources