Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting

Yiwen Shi; Ping Ren; Jing Wang; Biao Han; Taha ValizadehAslani; Felix Agbavor; Yi Zhang; Meng Hu; Liang Zhao; Hualou Liang

doi:10.1016/j.jbi.2023.104533

Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting

J Biomed Inform. 2023 Dec:148:104533. doi: 10.1016/j.jbi.2023.104533. Epub 2023 Nov 2.

Authors

Yiwen Shi¹, Ping Ren², Jing Wang², Biao Han², Taha ValizadehAslani³, Felix Agbavor⁴, Yi Zhang², Meng Hu², Liang Zhao², Hualou Liang⁵

Affiliations

¹ College of Computing and Informatics, Drexel University, Philadelphia, PA, United States.
² Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, United States.
³ Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA, United States.
⁴ School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States.
⁵ School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States. Electronic address: hualou.liang@drexel.edu.

PMID: 37918623
DOI: 10.1016/j.jbi.2023.104533

Abstract

Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment, which provides the basis of recommendations for fasting and fed bioequivalence studies to guide the pharmaceutical industry for developing generic drug products. However, manual summarization of food effect from extensive drug application review documents is time-consuming. Therefore, there is a need to develop automated methods to generate food effect summary. Recent advances in natural language processing (NLP), particularly large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability with regard to the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach,iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the iterative prompting process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. Taken together, these results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of the PSG assessment cycle and promoting generic drug product development.

Keywords: Drug labeling; GPT-4; Large Language Models; Prompt Engineering; Text Summarization.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Benchmarking*
Drugs, Generic*
Language
Natural Language Processing

Substances

Drugs, Generic