Performance of ChatGPT in Diagnosis of Corneal Eye Diseases

Mohammad Delsoz; Yeganeh Madadi; Hina Raja; Wuqaas M Munir; Brendan Tamm; Shiva Mehravaran; Mohammad Soleimani; Ali Djalilian; Siamak Yousefi

doi:10.1097/ICO.0000000000003492

Performance of ChatGPT in Diagnosis of Corneal Eye Diseases

Cornea. 2024 May 1;43(5):664-670. doi: 10.1097/ICO.0000000000003492. Epub 2024 Feb 23.

Authors

Mohammad Delsoz¹, Yeganeh Madadi¹, Hina Raja¹, Wuqaas M Munir², Brendan Tamm², Shiva Mehravaran³, Mohammad Soleimani^{4

5}, Ali Djalilian⁴, Siamak Yousefi^{1

6}

Affiliations

¹ Department of Ophthalmology, Hamilton Eye Institute, University of Tennessee Health Science Center, Memphis, TN.
² Department of Ophthalmology and Visual Sciences, University of Maryland School of Medicine, Baltimore, MD.
³ Department of Biology, School of Computer, Mathematical, and Natural Sciences, Morgan State University, Baltimore, MD.
⁴ Department of Ophthalmology and Visual Sciences, University of Illinois at Chicago, Chicago, IL.
⁵ Eye Research Center, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran ; and.
⁶ Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN.

PMID: 38391243
DOI: 10.1097/ICO.0000000000003492

Abstract

Purpose: The aim of this study was to assess the capabilities of ChatGPT-4.0 and ChatGPT-3.5 for diagnosing corneal eye diseases based on case reports and compare with human experts.

Methods: We randomly selected 20 cases of corneal diseases including corneal infections, dystrophies, and degenerations from a publicly accessible online database from the University of Iowa. We then input the text of each case description into ChatGPT-4.0 and ChatGPT-3.5 and asked for a provisional diagnosis. We finally evaluated the responses based on the correct diagnoses, compared them with the diagnoses made by 3 corneal specialists (human experts), and evaluated interobserver agreements.

Results: The provisional diagnosis accuracy based on ChatGPT-4.0 was 85% (17 correct of 20 cases), whereas the accuracy of ChatGPT-3.5 was 60% (12 correct cases of 20). The accuracy of 3 corneal specialists compared with ChatGPT-4.0 and ChatGPT-3.5 was 100% (20 cases, P = 0.23, P = 0.0033), 90% (18 cases, P = 0.99, P = 0.6), and 90% (18 cases, P = 0.99, P = 0.6), respectively. The interobserver agreement between ChatGPT-4.0 and ChatGPT-3.5 was 65% (13 cases), whereas the interobserver agreement between ChatGPT-4.0 and 3 corneal specialists was 85% (17 cases), 80% (16 cases), and 75% (15 cases), respectively. However, the interobserver agreement between ChatGPT-3.5 and each of 3 corneal specialists was 60% (12 cases).

Conclusions: The accuracy of ChatGPT-4.0 in diagnosing patients with various corneal conditions was markedly improved than ChatGPT-3.5 and promising for potential clinical integration. A balanced approach that combines artificial intelligence-generated insights with clinical expertise holds a key role for unveiling its full potential in eye care.

MeSH terms

Artificial Intelligence*
Cornea
Corneal Diseases* / diagnosis
Databases, Factual
Humans