Background: Radiographic knee osteoarthritis (OA) severity and clinical severity are often dissociated. Artificial intelligence (AI) aid was shown to increase inter-rater reliability in radiographic OA diagnosis. Thus, AI-aided radiographic diagnoses were compared against AI-unaided diagnoses with regard to their correlations with clinical severity.
Methods: Seventy-one DICOMs (m/f = 27:42, mean age: 27.86 ± 6.5) (X-ray format) were used for AI analysis (KOALA software, IB Lab GmbH). Subjects were recruited from a physiotherapy trial (MLKOA). At baseline, each subject received (i) a knee X-ray and (ii) an assessment of five main scores (Tegner Scale (TAS); Knee Injury and Osteoarthritis Outcome Score (KOOS); International Physical Activity Questionnaire; Star Excursion Balance Test; Six-Minute Walk Test). Clinical assessments were repeated three times (weeks 6, 12 and 24). Three physicians analyzed the presented X-rays both with and without AI via KL grading. Analyses of the (i) inter-rater reliability (IRR) and (ii) Spearman's Correlation Test for the overall KL score for each individual rater with clinical score were performed.
Results: We found that AI-aided diagnostic ratings had a higher association with the overall KL score and the KOOS. The amount of improvement due to AI depended on the individual rater.
Conclusion: AI-guided systems can improve the ratings of knee radiographs and show a stronger association with clinical severity. These results were shown to be influenced by individual readers. Thus, AI training amongst physicians might need to be increased. KL might be insufficient as a single tool for knee OA diagnosis.
Keywords: artificial intelligence; clinical severity scores; knee osteoarthritis; knee radiographs.