Performance of artificial intelligence chatbots in interpreting clinical images of pressure injuries

Wound Repair Regen. 2024 May 15. doi: 10.1111/wrr.13189. Online ahead of print.

Abstract

To evaluate the accuracy of AI chatbots in staging pressure injuries according to the National Pressure Injury Advisory Panel (NPIAP) Staging through clinical image interpretation, a cross-sectional design was conducted to assess five leading publicly available AI chatbots. As a result, three chatbots were unable to interpret the clinical images, whereas GPT-4 Turbo achieved a high accuracy rate (83.0%) in staging pressure injuries, notably outperforming BingAI Creative mode (24.0%) with statistical significance (p < 0.001). GPT-4 Turbo accurately identified Stages 1 (p < 0.001), 3 (p = 0.001), and 4 (p < 0.001) pressure injuries, and suspected deep tissue injuries (p < 0.001), while BingAI demonstrated significantly lower accuracy across all stages. The findings highlight the potential of AI chatbots, especially GPT-4 Turbo, in accurately diagnosing images and aiding the subsequent management of pressure injuries.

Keywords: AI chatbot; ChatGPT; artificial intelligence; pressure injury; pressure ulcer.