Automatic screening using word embeddings achieved high sensitivity and workload reduction for updating living network meta-analyses

J Clin Epidemiol. 2019 Apr:108:86-94. doi: 10.1016/j.jclinepi.2018.12.001. Epub 2018 Dec 7.

Abstract

Objectives: We aimed to develop and evaluate an algorithm for automatically screening citations when updating living network meta-analysis (NMA).

Study design and setting: Our algorithm learns from the initial screening of citations conducted when creating an NMA to automatically identify eligible citations (i.e., needing full-text consideration) when updating the NMA. We evaluated our algorithm on four NMAs from different medical domains. For each NMA we constructed sets of initially screened citations and citations to screen during an update that took place 2 years after the conduct of the NMA. We encoded free text of citations (title and abstract) using word embeddings. On top of this vectorized representation, we fitted a logistic regression model to the set of initially screened citations to predict the eligibility of citations screened during an update.

Results: Our algorithm achieved 100% sensitivity on two NMAs (100% [95% confidence interval 93-100] and 100% [40-100] sensitivity), and 94% (81-99) and 97% (86-100) on the remaining two others. For all NMAs, our algorithm would have spared to manually screen 1,345 of 2,530 citations, decreasing the workload by 53% (51-55), while missing 3 of 124 eligible citations (2% [1-7]), none of which were finally included in the NMAs after full-text consideration.

Conclusion: For updating an NMA after 2 years, our algorithm considerably diminished the workload required for screening, and the number of missed eligible citations remained low.

Keywords: Automatic screening; Live cumulative network meta-analysis; Machine learning; Natural language processing; Network meta-analysis; Word embeddings.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Confidence Intervals
  • Evidence-Based Medicine / methods
  • Humans
  • Information Storage and Retrieval / methods*
  • Network Meta-Analysis*
  • Randomized Controlled Trials as Topic
  • Support Vector Machine
  • Workload