Appropriate trust in artificial intelligence for the optical diagnosis of colorectal polyps: the role of human/artificial intelligence interaction

Gastrointest Endosc. 2024 Dec;100(6):1070-1078.e10. doi: 10.1016/j.gie.2024.06.029. Epub 2024 Jun 26.

Abstract

Background and aims: Computer-aided diagnosis (CADx) for the optical diagnosis of colorectal polyps is thoroughly investigated. However, studies on human-artificial intelligence interaction are lacking. Our aim was to investigate endoscopists' trust in CADx by evaluating whether communicating a calibrated algorithm confidence score improved trust.

Methods: Endoscopists optically diagnosed 60 colorectal polyps. Initially, endoscopists diagnosed the polyps without CADx assistance (initial diagnosis). Immediately afterward, the same polyp was again shown with a CADx prediction: either only a prediction (benign or premalignant) or a prediction accompanied by a calibrated confidence score (0-100). A confidence score of 0 indicated a benign prediction, 100 a (pre)malignant prediction. In half of the polyps, CADx was mandatory, and for the other half, CADx was optional. After reviewing the CADx prediction, endoscopists made a final diagnosis. Histopathology was used as the reference standard. Endoscopists' trust in CADx was measured as CADx prediction utilization: the willingness to follow CADx predictions when the endoscopists initially disagreed with the CADx prediction.

Results: Twenty-three endoscopists participated. Presenting CADx predictions increased the endoscopists' diagnostic accuracy (69.3% initial vs 76.6% final diagnosis, P < .001). The CADx prediction was used in 36.5% (n = 183 of 501) of disagreements. Adding a confidence score led to lower CADx prediction utilization, except when the confidence score surpassed 60. Mandatory CADx decreased CADx prediction utilization compared to optional CADx. Appropriate trust-using correct or disregarding incorrect CADx predictions-was 48.7% (n = 244 of 501).

Conclusions: Appropriate trust was common, and CADx prediction utilization was highest for the optional CADx without confidence scores. These results express the importance of a better understanding of human-artificial intelligence interaction.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Colonic Polyps* / diagnosis
  • Colonoscopy* / methods
  • Colorectal Neoplasms / diagnosis
  • Diagnosis, Computer-Assisted* / methods
  • Female
  • Humans
  • Male
  • Middle Aged
  • Precancerous Conditions / diagnosis
  • Precancerous Conditions / pathology
  • Trust*