Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer

Alexander Pan; David Musheyev; Daniel Bockelman; Stacy Loeb; Abdo E Kabarriti

doi:10.1001/jamaoncol.2023.2947

Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer

JAMA Oncol. 2023 Oct 1;9(10):1437-1440. doi: 10.1001/jamaoncol.2023.2947.

Authors

Alexander Pan¹, David Musheyev¹, Daniel Bockelman¹, Stacy Loeb^{2

3

4}, Abdo E Kabarriti¹

Affiliations

¹ Department of Urology, State University of New York Downstate Health Sciences University, New York.
² Department of Urology, New York University School of Medicine, New York.
³ Department of Population Health, New York University School of Medicine, New York.
⁴ Department of Surgery, VA New York Harbor Health Care, New York.

PMID: 37615960
PMCID: PMC10450581 (available on 2024-08-24)
DOI: 10.1001/jamaoncol.2023.2947

Abstract

Importance: Consumers are increasingly using artificial intelligence (AI) chatbots as a source of information. However, the quality of the cancer information generated by these chatbots has not yet been evaluated using validated instruments.

Objective: To characterize the quality of information and presence of misinformation about skin, lung, breast, colorectal, and prostate cancers generated by 4 AI chatbots.

Design, setting, and participants: This cross-sectional study assessed AI chatbots' text responses to the 5 most commonly searched queries related to the 5 most common cancers using validated instruments. Search data were extracted from the publicly available Google Trends platform and identical prompts were used to generate responses from 4 AI chatbots: ChatGPT version 3.5 (OpenAI), Perplexity (Perplexity.AI), Chatsonic (Writesonic), and Bing AI (Microsoft).

Exposures: Google Trends' top 5 search queries related to skin, lung, breast, colorectal, and prostate cancer from January 1, 2021, to January 1, 2023, were input into 4 AI chatbots.

Main outcomes and measures: The primary outcomes were the quality of consumer health information based on the validated DISCERN instrument (scores from 1 [low] to 5 [high] for quality of information) and the understandability and actionability of this information based on the understandability and actionability domains of the Patient Education Materials Assessment Tool (PEMAT) (scores of 0%-100%, with higher scores indicating a higher level of understandability and actionability). Secondary outcomes included misinformation scored using a 5-item Likert scale (scores from 1 [no misinformation] to 5 [high misinformation]) and readability assessed using the Flesch-Kincaid Grade Level readability score.

Results: The analysis included 100 responses from 4 chatbots about the 5 most common search queries for skin, lung, breast, colorectal, and prostate cancer. The quality of text responses generated by the 4 AI chatbots was good (median [range] DISCERN score, 5 [2-5]) and no misinformation was identified. Understandability was moderate (median [range] PEMAT Understandability score, 66.7% [33.3%-90.1%]), and actionability was poor (median [range] PEMAT Actionability score, 20.0% [0%-40.0%]). The responses were written at the college level based on the Flesch-Kincaid Grade Level score.

Conclusions and relevance: Findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information.

Publication types

Comment