Background and objective: The utility of predictive models depends on their external validity, that is, their ability to maintain accuracy when applied to patients and settings different from those on which the models were developed. We report a simulation study that compared the external validity of standard logistic regression (LR1), logistic regression with piecewise-linear and quadratic terms (LR2), classification trees, and neural networks (NNETs).
Methods: We developed predictive models on data simulated from a specified population and on data from perturbed forms of the population not representative of the original distribution. All models were tested on new data generated from the population.
Results: The performance of LR2 was superior to that of the other model types when the models were developed on data sampled from the population (mean receiver operating characteristic [ROC] areas 0.769, 0.741, 0.724, and 0.682, for LR2, LR1, NNETs, and trees, respectively) and when they were developed on nonrepresentative data (mean ROC areas 0.734, 0.713, 0.703, and 0.667). However, when the models developed using nonrepresentative data were compared with models developed from data sampled from the population, LR2 had the greatest loss in performance.
Conclusion: Our results highlight the necessity of external validation to test the transportability of predictive models.