Prediction of quality-of-life improvement after total hip arthroplasty : a simplified and internally validated model based on 82,526 total hip arthroplasties from the Swedish Arthroplasty Register

Bone Jt Open. 2025 Nov 21;6(11):1504-1514. doi: 10.1302/2633-1462.611.BJO-2025-0138.R1.

Abstract

Aims: Pain and poor health-related quality of life measures serve as the primary indication for primary elective total hip arthroplasty (THA). It remains challenging to predict whether THA delivers the patient-anticipated improvements. Our study aimed to develop and validate statistical and machine learning prediction models of one-year clinical improvement in patient-reported outcome measures (PROMs) after elective THA.

Methods: We included 82,526 patients with primary elective THAs from the Swedish Arthroplasty Register (SAR) for forecasting one-year improvements in the EuroQol five-dimension questionnaire (EQ-5D) index, EQ-visual analogue scale (VAS), and combined EQ-5D/EQ-VAS scores. Two minimal clinically important difference (MCID) thresholds were applied for the EQ-5D index score based on the approaches of standardized response mean (SRM) of 0.196 and capacity of benefit (CoB) of 0.428. MCID cutoff for the EQ-VAS was set to 7.81. A total of 21 features were used to feed the models. To avoid estimates bias, we eliminated missing data. Model performance was tested using the area under the receiver operating characteristic curve (AUC), and importance of features was identified in the best performing algorithm.

Results: Applying the SRM MCID, approximately two-thirds of patients reported one-year improvements in EQ-5D index (66.3%) and EQ-VAS (69.1%). The improvement rate decreased to 51.7% when we combined improvements in both outcomes. A higher CoB cut-off for EQ-5D index yielded lower rates (~40% for the EQ-5D index and 31.3% for the combined measure). The gradient boosting machine (GBM) consistently outperformed other models by a narrow margin in predicting significant clinical improvements in one-year PROMs and achieved a good to excellent binary discriminative power (AUC range 0.80% to 0.97%). Preoperative PROMs, EQ-5D index, EQ-VAS, and Charnley Hip Score, along with age, collectively contributed to over 80% of the algorithmic power in the ensemble GBM model.

Conclusion: We developed an interpretable machine learning model on a Swedish cohort that may facilitate personalized assessment of meaningful clinical improvement after elective THA.