Using an artificial intelligence tool can be as accurate as human assessors in level one screening for a systematic review

Health Info Libr J. 2024 Jun;41(2):136-148. doi: 10.1111/hir.12413. Epub 2021 Nov 18.

Abstract

Background: Artificial intelligence (AI) offers a promising solution to expedite various phases of the systematic review process such as screening.

Objective: We aimed to assess the accuracy of an AI tool in identifying eligible references for a systematic review compared to identification by human assessors.

Methods: For the case study (a systematic review of knowledge translation interventions), we used a diagnostic accuracy design and independently assessed for eligibility a set of articles (n = 300) using human raters and the AI system DistillerAI (Evidence Partners, Ottawa, Canada). We analysed a series of 64 possible confidence levels for the AI's decisions and calculated several standard parameters of diagnostic accuracy for each.

Results: When set to a lower AI confidence threshold of 0.1 or greater and an upper threshold of 0.9 or lower, DistillerAI made article selection decisions very similarly to human assessors. Within this range, DistillerAI made a decision on the majority of articles (93-100%), with a sensitivity of 1.0 and specificity ranging from 0.9 to 1.0.

Conclusion: DistillerAI appears to be accurate in its assessment of articles in a case study of 300 articles. Further experimentation with DistillerAI will establish its performance among other subject areas.

Keywords: artificial intelligence; precision; recall; review; systematic.

MeSH terms

  • Artificial Intelligence* / trends
  • Humans
  • Systematic Reviews as Topic