Diagnostic power of ChatGPT 4 in distal radius fracture detection through wrist radiographs

Arch Orthop Trauma Surg. 2024 May;144(5):2461-2467. doi: 10.1007/s00402-024-05298-2. Epub 2024 Apr 5.

Abstract

Distal radius fractures rank among the most prevalent fractures in humans, necessitating accurate radiological imaging and interpretation for optimal diagnosis and treatment. In addition to human radiologists, artificial intelligence systems are increasingly employed for radiological assessments. Since 2023, ChatGPT 4 has offered image analysis capabilities, which can also be used for the analysis of wrist radiographs. This study evaluates the diagnostic power of ChatGPT 4 in identifying distal radius fractures, comparing it with a board-certified radiologist, a hand surgery resident, a medical student, and the well-established AI Gleamer BoneView™. Results demonstrate ChatGPT 4's good diagnostic accuracy (sensitivity 0.88, specificity 0.98, diagnostic power (AUC) 0.93), surpassing the medical student (sensitivity 0.98, specificity 0.72, diagnostic power (AUC) 0.85; p = 0.04) significantly. Nevertheless, the diagnostic power of ChatGPT 4 lags behind the hand surgery resident (sensitivity 0.99, specificity 0.98, diagnostic power (AUC) 0.985; p = 0.014) and Gleamer BoneView™(sensitivity 1.00, specificity 0.98, diagnostic power (AUC) 0.99; p = 0.006). This study highlights the utility and potential applications of artificial intelligence in modern medicine, emphasizing ChatGPT 4 as a valuable tool for enhancing diagnostic capabilities in the field of medical imaging.

Keywords: Artificial intelligence; ChatGPT; Distal radius fracture; Fracture detection; Hand surgery; Radiology.

MeSH terms

  • Adult
  • Aged
  • Artificial Intelligence
  • Female
  • Humans
  • Male
  • Middle Aged
  • Radiography / methods
  • Radius Fractures* / diagnostic imaging
  • Sensitivity and Specificity
  • Wrist Fractures
  • Wrist Injuries / diagnostic imaging