Monte Carlo cross-validation for a study with binary outcome and limited sample size

BMC Med Inform Decis Mak. 2022 Oct 17;22(1):270. doi: 10.1186/s12911-022-02016-z.

Abstract

Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV.

Keywords: Alzheimer’s disease; Binary outcome; Cross-validation; Machine learning; Monte Carlo cross-validation.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computer Simulation
  • Humans
  • Machine Learning*
  • Monte Carlo Method
  • Sample Size