Cancer Vaccine Adjuvant Name Recognition from Biomedical Literature using Large Language Models

ArXiv [Preprint]. 2025 Feb 12:arXiv:2502.09659v1.

Abstract

Motivation: An adjuvant is a chemical incorporated into vaccines that enhances their efficacy by improving the immune response. Identifying adjuvant names from cancer vaccine studies is essential for furthering research and enhancing immunotherapies. However, the manual curation from the constantly expanding biomedical literature poses significant challenges. This study explores the automated recognition of vaccine adjuvant names using state-of-the-art Large Language Models (LLMs), specifically Generative Pretrained Transformers (GPT) and Large Language Model Meta AI (Llama).

Methods: We utilized two datasets: 97 clinical trial records from AdjuvareDB and 290 PubMed abstracts annotated with the Vaccine Adjuvant Compendium (VAC). Two LLMs, GPT-4o and Llama 3.2 were employed in zero-shot and few-shot learning paradigms with up to four examples per prompt. Prompts explicitly targeted adjuvant names, testing the impact of contextual information such as substances or interventions. Outputs underwent automated and manual validation for accuracy and consistency.

Results: GPT-4o consistently attained 100% Precision across all situations, while also exhibiting notable enhancements in Recall and F1-scores, particularly with the incorporation of interventions. On the VAC dataset, GPT-4o achieved a maximum F1-score of 77.32% with interventions, surpassing Llama-3.2-3B by approximately 2%. On the AdjuvareDB dataset, GPT-4o reached an F1-score of 81.67% for three-shot prompting with interventions, surpassing Llama-3.2-3B's maximum F1-score of 65.62%. These results highlight the critical role of contextual information in enhancing model performance, with GPT-4o demonstrating a superior ability to leverage this enrichment.

Conclusion: Our findings demonstrate that LLMs excel at accurately identifying adjuvant names, including rare and novel variations of naming representation. This study emphasizes the capability of LLMs to enhance cancer vaccine development by efficiently extracting insights from clinical trial data. Future work aims to broaden the framework to encompass a wider array of biomedical literature and enhance model generalizability across various vaccines and adjuvants.

Availability: Source code is available at https://github.com/hurlab/Vaccine-Adjuvant-LLM.

Publication types

  • Preprint