Improving Radiographic Fracture Recognition Performance and Efficiency Using Artificial Intelligence

Radiology. 2022 Mar;302(3):627-636. doi: 10.1148/radiol.210937. Epub 2021 Dec 21.


Background Missed fractures are a common cause of diagnostic discrepancy between initial radiographic interpretation and the final read by board-certified radiologists. Purpose To assess the effect of assistance by artificial intelligence (AI) on diagnostic performances of physicians for fractures on radiographs. Materials and Methods This retrospective diagnostic study used the multi-reader, multi-case methodology based on an external multicenter data set of 480 examinations with at least 60 examinations per body region (foot and ankle, knee and leg, hip and pelvis, hand and wrist, elbow and arm, shoulder and clavicle, rib cage, and thoracolumbar spine) between July 2020 and January 2021. Fracture prevalence was set at 50%. The ground truth was determined by two musculoskeletal radiologists, with discrepancies solved by a third. Twenty-four readers (radiologists, orthopedists, emergency physicians, physician assistants, rheumatologists, family physicians) were presented the whole validation data set (n = 480), with and without AI assistance, with a 1-month minimum washout period. The primary analysis had to demonstrate superiority of sensitivity per patient and the noninferiority of specificity per patient at -3% margin with AI aid. Stand-alone AI performance was also assessed using receiver operating characteristic curves. Results A total of 480 patients were included (mean age, 59 years ± 16 [standard deviation]; 327 women). The sensitivity per patient was 10.4% higher (95% CI: 6.9, 13.9; P < .001 for superiority) with AI aid (4331 of 5760 readings, 75.2%) than without AI (3732 of 5760 readings, 64.8%). The specificity per patient with AI aid (5504 of 5760 readings, 95.6%) was noninferior to that without AI aid (5217 of 5760 readings, 90.6%), with a difference of +5.0% (95% CI: +2.0, +8.0; P = .001 for noninferiority). AI shortened the average reading time by 6.3 seconds per examination (95% CI: -12.5, -0.1; P = .046). The sensitivity by patient gain was significant in all regions (+8.0% to +16.2%; P < .05) but shoulder and clavicle and spine (+4.2% and +2.6%; P = .12 and .52). Conclusion AI assistance improved the sensitivity and may even improve the specificity of fracture detection by radiologists and nonradiologists, without lengthening reading time. Published under a CC BY 4.0 license. Online supplemental material is available for this article. See also the editorial by Link and Pedoia in this issue.

MeSH terms

  • Artificial Intelligence*
  • Datasets as Topic
  • Diagnostic Errors / prevention & control*
  • Female
  • Fractures, Bone / diagnostic imaging*
  • Humans
  • Male
  • Middle Aged
  • Quality Improvement*
  • Radiographic Image Interpretation, Computer-Assisted / methods*
  • Retrospective Studies
  • Sensitivity and Specificity