Retrieval augmented scientific claim verification

Hao Liu; Ali Soroush; Jordan G Nestor; Elizabeth Park; Betina Idnay; Yilu Fang; Jane Pan; Stan Liao; Marguerite Bernard; Yifan Peng; Chunhua Weng

doi:10.1093/jamiaopen/ooae021

Retrieval augmented scientific claim verification

JAMIA Open. 2024 Feb 21;7(1):ooae021. doi: 10.1093/jamiaopen/ooae021. eCollection 2024 Apr.

Authors

Hao Liu¹, Ali Soroush², Jordan G Nestor², Elizabeth Park², Betina Idnay³, Yilu Fang³, Jane Pan⁴, Stan Liao⁴, Marguerite Bernard⁵, Yifan Peng⁶, Chunhua Weng³

Affiliations

¹ School of Computing, Montclair State University, Montclair, NJ 07043, United States.
² Department of Medicine, Columbia University, New York, NY 10027, United States.
³ Department of Biomedical Informatics, Columbia University, New York, NY 10027, United States.
⁴ Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, United States.
⁵ Institute of Human Nutrition, Columbia University, New York, NY 10027, United States.
⁶ Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States.

Abstract

Objective: To automate scientific claim verification using PubMed abstracts.

Materials and methods: We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim. We also created an ensemble of three state-of-the-art deep learning models to classify rationale of support, refute, and neutral. We then constructed CoVERt, a new COVID VERification dataset comprising 15 PICO-encoded drug claims accompanied by 96 manually selected and labeled clinical trial abstracts that either support or refute each claim. We used CoVERt and SciFact (a public scientific claim verification dataset) to assess CliVER's performance in predicting labels. Finally, we compared CliVER to clinicians in the verification of 19 claims from 6 disease domains, using 189 648 PubMed abstracts extracted from January 2010 to October 2021.

Results: In the evaluation of label prediction accuracy on CoVERt, CliVER achieved a notable F1 score of 0.92, highlighting the efficacy of the retrieval-augmented models. The ensemble model outperforms each individual state-of-the-art model by an absolute increase from 3% to 11% in the F1 score. Moreover, when compared with four clinicians, CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively.

Conclusion: CliVER demonstrates its early potential to automate scientific claim verification using retrieval-augmented strategies to harness the wealth of clinical trial abstracts in PubMed. Future studies are warranted to further test its clinical utility.

Keywords: clinical trial; deep learning; evidence appraisal; evidence retrieval; natural language processing.

Abstract

Grants and funding