Validation in prediction research: the waste by data splitting

J Clin Epidemiol. 2018 Nov;103:131-133. doi: 10.1016/j.jclinepi.2018.07.010. Epub 2018 Jul 29.


Accurate prediction of medical outcomes is important for diagnosis and prognosis. The standard requirement in major medical journals is nowadays that validity outside the development sample needs to be shown. Is such data splitting an example of a waste of resources? In large samples, interest should shift to assessment of heterogeneity in model performance across settings. In small samples, cross-validation and bootstrapping are more efficient approaches. In conclusion, random data splitting should be abolished for validation of prediction models.

MeSH terms

  • Data Accuracy
  • Diagnosis*
  • Diagnostic Techniques and Procedures
  • Humans
  • Outcome Assessment, Health Care* / methods
  • Outcome Assessment, Health Care* / standards
  • Prognosis*
  • Reproducibility of Results*
  • Research Design
  • Sample Size*
  • Treatment Outcome