Diagnostic power of ChatGPT 4 in distal radius fracture detection through wrist radiographs

Sinan Mert; Patrick Stoerzer; Johannes Brauer; Benedikt Fuchs; Elisabeth M Haas-Lützenberger; Wolfram Demmer; Riccardo E Giunta; Tim Nuernberger

doi:10.1007/s00402-024-05298-2

Diagnostic power of ChatGPT 4 in distal radius fracture detection through wrist radiographs

Arch Orthop Trauma Surg. 2024 May;144(5):2461-2467. doi: 10.1007/s00402-024-05298-2. Epub 2024 Apr 5.

Authors

Sinan Mert¹, Patrick Stoerzer², Johannes Brauer², Benedikt Fuchs², Elisabeth M Haas-Lützenberger², Wolfram Demmer², Riccardo E Giunta², Tim Nuernberger²

Affiliations

¹ Division of Hand, Plastic and Aesthetic Surgery, LMU University Hospital, LMU Munich, 80336, München, Germany. Sinan.Mert@med.uni-muenchen.de.
² Division of Hand, Plastic and Aesthetic Surgery, LMU University Hospital, LMU Munich, 80336, München, Germany.

Abstract

Distal radius fractures rank among the most prevalent fractures in humans, necessitating accurate radiological imaging and interpretation for optimal diagnosis and treatment. In addition to human radiologists, artificial intelligence systems are increasingly employed for radiological assessments. Since 2023, ChatGPT 4 has offered image analysis capabilities, which can also be used for the analysis of wrist radiographs. This study evaluates the diagnostic power of ChatGPT 4 in identifying distal radius fractures, comparing it with a board-certified radiologist, a hand surgery resident, a medical student, and the well-established AI Gleamer BoneView™. Results demonstrate ChatGPT 4's good diagnostic accuracy (sensitivity 0.88, specificity 0.98, diagnostic power (AUC) 0.93), surpassing the medical student (sensitivity 0.98, specificity 0.72, diagnostic power (AUC) 0.85; p = 0.04) significantly. Nevertheless, the diagnostic power of ChatGPT 4 lags behind the hand surgery resident (sensitivity 0.99, specificity 0.98, diagnostic power (AUC) 0.985; p = 0.014) and Gleamer BoneView™(sensitivity 1.00, specificity 0.98, diagnostic power (AUC) 0.99; p = 0.006). This study highlights the utility and potential applications of artificial intelligence in modern medicine, emphasizing ChatGPT 4 as a valuable tool for enhancing diagnostic capabilities in the field of medical imaging.

Keywords: Artificial intelligence; ChatGPT; Distal radius fracture; Fracture detection; Hand surgery; Radiology.

MeSH terms

Adult
Aged
Artificial Intelligence
Female
Humans
Male
Middle Aged
Radiography / methods
Radius Fractures* / diagnostic imaging
Sensitivity and Specificity
Wrist Fractures
Wrist Injuries / diagnostic imaging