Automated O-RADS Risk Stratification Using a Large Language Model Analysis of Narrative Ultrasound Reports

Ultrasound Med Biol. 2026 Jul;52(7):1363-1373. doi: 10.1016/j.ultrasmedbio.2026.03.009. Epub 2026 Apr 11.

Abstract

Background: The Ovarian-Adnexal Reporting and Data System (O-RADS) is essential for standardizing the risk stratification of ovarian lesions detected on ultrasound. However, manual assignment of O-RADS scores is time-consuming and can vary between observers. This study investigates an automated method for O-RADS scoring using a large language model (LLM) to analyze narrative ultrasound reports.

Methods: A two-stage pipeline was developed for automated O-RADS classification. Initially, the Lingshu LLM, specialized in medical language, extracted and embedded features from free-text descriptions of ovarian lesions. It identified key diagnostic features mentioned by sonologists. Subsequently, these features were used to train and evaluate several machine learning algorithms, including logistic regression (LR), support vector machines and random forests, to predict O-RADS scores (1-5).

Results: The proposed method was evaluated on a dataset of 513 cases using fivefold cross-validation. The pipeline using Lingshu model embeddings with LR achieved the highest accuracy of 0.803 [95% CI: 0.753, 0.853], a weighted-average F1-score of 0.819 [95% CI: 0.777, 0.861] and a macro-averaged AUROC of 0.948 [95% CI: 0.937, 0.959]. This outperformed the MedGemma model's pipeline, which had an accuracy of 0.760 [95% CI: 0.700, 0.820], F1-score of 0.787 [95% CI: 0.739, 0.835] and AUROC of 0.941 [95% CI: 0.911, 0.971].

Conclusion: This study introduces a novel approach to automate O-RADS scoring using LLMs for feature extraction and traditional machine learning for classification. The results indicate that this method can accurately stratify ovarian cancer risk, potentially improving clinical workflow efficiency and reducing diagnostic variability. This approach may support radiologists in making more consistent and timely assessments.

Keywords: Large language model (LLM); Machine learning (ML); Natural language processing (NLP); Ovarian cancer; Ovarian-Adnexal Reporting and Data System (O-RADS); Risk stratification; Ultrasound radiology.

MeSH terms

  • Female
  • Humans
  • Large Language Models
  • Machine Learning
  • Natural Language Processing
  • Ovarian Neoplasms* / diagnostic imaging
  • Radiology Information Systems*
  • Risk Assessment
  • Ultrasonography / methods