Fostering reproducibility and generalizability in machine learning for clinical prediction modeling in spine surgery

Spine J. 2021 Oct;21(10):1610-1616. doi: 10.1016/j.spinee.2020.10.006. Epub 2020 Oct 13.


As the use of machine learning algorithms in the development of clinical prediction models has increased, researchers are becoming more aware of the deleterious effects that stem from the lack of reporting standards. One of the most obvious consequences is the insufficient reproducibility found in current prediction models. In an attempt to characterize methods to improve reproducibility and to allow for better clinical performance, we utilize a previously proposed taxonomy that separates reproducibility into 3 components: technical, statistical, and conceptual reproducibility. By following this framework, we discuss common errors that lead to poor reproducibility, highlight the importance of generalizability when evaluating a ML model's performance, and provide suggestions to optimize generalizability to ensure adequate performance. These efforts are a necessity before such models are applied to patient care.

Keywords: Machine learning; Overfitting; Predictive modeling; Reproducibility.

MeSH terms

  • Algorithms*
  • Humans
  • Machine Learning*
  • Reproducibility of Results