External Validation of a Machine Learning Algorithm for Predicting Clinically Meaningful Functional Improvement After Arthroscopic Hip Preservation Surgery

Kyle N Kunze; Austin Kaidi; Sophia Madjarova; Evan M Polce; Anil S Ranawat; Danyal H Nawabi; Bryan T Kelly; Shane J Nho; Benedict U Nwachukwu

doi:10.1177/03635465221124275

External Validation of a Machine Learning Algorithm for Predicting Clinically Meaningful Functional Improvement After Arthroscopic Hip Preservation Surgery

Am J Sports Med. 2022 Nov;50(13):3593-3599. doi: 10.1177/03635465221124275. Epub 2022 Sep 22.

Authors

Kyle N Kunze^{1

2}, Austin Kaidi^{1

2}, Sophia Madjarova², Evan M Polce³, Anil S Ranawat^{1

2}, Danyal H Nawabi^{1

2}, Bryan T Kelly^{1

2}, Shane J Nho⁴, Benedict U Nwachukwu^{1

2}

Affiliations

¹ Department of Orthopedic Surgery, Hospital for Special Surgery, New York, New York, USA.
² Sports Medicine and Shoulder Institute, Hospital for Special Surgery, New York, New York, USA.
³ School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin, USA.
⁴ Section of Young Adult Hip Surgery, Division of Sports Medicine, Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, Illinois, USA.

PMID: 36135373
DOI: 10.1177/03635465221124275

Abstract

Background: Individualized risk prediction has become possible with machine learning (ML), which may have important implications in enhancing clinical decision making. We previously developed an ML algorithm to predict propensity for clinically meaningful outcome improvement after hip arthroscopy for femoroacetabular impingement syndrome. External validity of prognostic models is critical to determine generalizability, although it is rarely performed.

Purpose: To assess the external validity of an ML algorithm for predicting clinically meaningful improvement after hip arthroscopy.

Study design: Cohort study; Level of evidence, 3.

Methods: An independent hip preservation registry at a tertiary academic medical center was queried for consecutive patients/athletes who underwent hip arthroscopy for femoroacetabular impingement syndrome between 2015 and 2017. By assuming a minimal clinically important difference (MCID) outcome/event proportion of 75% based on the original study, a minimum sample of 132 patients was required. In total, 154 patients were included. Age, body mass index, alpha angle on anteroposterior pelvic radiographs, Tönnis grade and angle, and preoperative Hip Outcome Score-Sports Subscale were used as model inputs to predict the MCID for the Hip Outcome Score-Sports Subscale 2 years postoperatively. Performance was assessed using identical metrics to the internal validation study and included discrimination, calibration, Brier score, and decision curve analysis.

Results: The concordance statistic in the validation cohort was 0.80 (95% CI, 0.71 to 0.87), suggesting good to excellent discrimination. The calibration slope was 1.16 (95% CI, 0.74 to 1.61) and the calibration intercept 0.13 (95% CI, -0.26 to 0.53). The Brier score was 0.15 (95% CI, 0.12 to 0.18). The null model Brier score was 0.20. Decision curve analysis revealed favorable net treatment benefit for patients with use of the algorithm as compared with interventional changes made for all and no patients.

Conclusion: The performance of this algorithm in an independent patient population in the northeast region of the United States demonstrated superior discrimination and comparable calibration to that of the derivation cohort. The external validation of this algorithm suggests that it is a reliable method to predict propensity for clinically meaningful improvement after hip arthroscopy and is an essential step forward toward introducing initial use in clinical practice. Potential uses include integration into electronic medical records for automated prediction, enhanced shared decision making, and more informed allocation of resources to optimize patient outcomes.

Keywords: external validation; femoroacetabular impingement syndrome; hip arthroscopy; machine learning; minimal clinically important difference.

MeSH terms

Activities of Daily Living
Algorithms
Arthroscopy
Child, Preschool
Cohort Studies
Femoracetabular Impingement* / surgery
Humans
Machine Learning
Treatment Outcome