We review the concept of overfitting, which is a well-known concern within the machine learning community, but less established in the clinical community. Overfitted models may lead to inadequate conclusions that may wrongly or even harmfully shape clinical decision-making. Overfitting can be defined as the difference among discriminatory training and testing performance, while it is normal that out-of-sample performance is equal to or ever so slightly worse than training performance for any adequately fitted model, a massively worse out-of-sample performance suggests relevant overfitting. We delve into resampling methods, specifically recommending k-fold cross-validation and bootstrapping to arrive at realistic estimates of out-of-sample error during training. Also, we encourage the use of regularization techniques such as L1 or L2 regularization, and to choose an appropriate level of algorithm complexity for the type of dataset used. Data leakage is addressed, and the importance of external validation to assess true out-of-sample performance and to-upon successful external validation-release the model into clinical practice is discussed. Finally, for highly dimensional datasets, the concepts of feature reduction using principal component analysis (PCA) as well as feature elimination using recursive feature elimination (RFE) are elucidated.
Keywords: Artificial intelligence; Clinical prediction model; Machine intelligence; Machine learning; Prediction; Prognosis.
© 2022. The Author(s), under exclusive license to Springer Nature Switzerland AG.