[Evaluating the accuracy of large language models in answering mammography screening questions in Italian and English: a study based on the Eusobi guidelines]

Recenti Prog Med. 2025 Mar;116(3):162-167. doi: 10.1701/4460.44556.
[Article in Italian]

Abstract

Introduction: Artificial intelligence (AI) is transforming various aspects of everyday life, including healthcare, through large language models (LLMs) like ChatGPT, Gemini, and Copilot. These systems are increasingly used to disseminate medical information, allowing patients to access simplified explanations. This study aims to compare responses to breast imaging-related questions formulated in Italian and English, based on Eusobi guidelines, evaluating the LLMs' ability to provide accurate and complete answers on mammography screening concepts.

Materials and methods: Nine questions related to breast cancer screening were developed by five breast radiologists based on Eusobi recommendations. These questions were submitted to ChatGPT, Gemini, and Copilot in both Italian and English. Responses were evaluated by two expert breast radiologists using a Likert scale (1 to 5), with statistical analysis performed to compare the accuracy, average length of responses, use of radiological sources and the agreement among readers.

Results: The average scores for responses were similar in both languages, ranging from 3.6 to 4 out of 5. Questions on general mammography concepts received more accurate answers, while more specific questions based on the latest guidelines showed incomplete responses, especially about the definition of dense breast. The sources used, particularly in Italian, were often non-specialized in radiology, highlighting a limitation of LLMs in providing detailed and up-to-date medical answers.

Conclusions: The study shows that LLMs are useful tools for medical communication, but they have limitations in delivering accurate answers on highly specialized medical topics. To improve the quality of information, collaboration between AI experts and healthcare professionals is necessary, especially in breast cancer prevention and screening.

Publication types

  • Comparative Study
  • English Abstract

MeSH terms

  • Artificial Intelligence*
  • Breast Neoplasms* / diagnosis
  • Breast Neoplasms* / diagnostic imaging
  • Early Detection of Cancer* / methods
  • Female
  • Humans
  • Italy
  • Language*
  • Large Language Models
  • Mammography* / methods
  • Mass Screening / methods
  • Practice Guidelines as Topic
  • Radiologists