Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools
- PMID: 31727150
- PMCID: PMC6857345
- DOI: 10.1186/s13643-019-1222-2
Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools
Abstract
Background: We explored the performance of three machine learning tools designed to facilitate title and abstract screening in systematic reviews (SRs) when used to (a) eliminate irrelevant records (automated simulation) and (b) complement the work of a single reviewer (semi-automated simulation). We evaluated user experiences for each tool.
Methods: We subjected three SRs to two retrospective screening simulations. In each tool (Abstrackr, DistillerSR, RobotAnalyst), we screened a 200-record training set and downloaded the predicted relevance of the remaining records. We calculated the proportion missed and workload and time savings compared to dual independent screening. To test user experiences, eight research staff tried each tool and completed a survey.
Results: Using Abstrackr, DistillerSR, and RobotAnalyst, respectively, the median (range) proportion missed was 5 (0 to 28) percent, 97 (96 to 100) percent, and 70 (23 to 100) percent for the automated simulation and 1 (0 to 2) percent, 2 (0 to 7) percent, and 2 (0 to 4) percent for the semi-automated simulation. The median (range) workload savings was 90 (82 to 93) percent, 99 (98 to 99) percent, and 85 (85 to 88) percent for the automated simulation and 40 (32 to 43) percent, 49 (48 to 49) percent, and 35 (34 to 38) percent for the semi-automated simulation. The median (range) time savings was 154 (91 to 183), 185 (95 to 201), and 157 (86 to 172) hours for the automated simulation and 61 (42 to 82), 92 (46 to 100), and 64 (37 to 71) hours for the semi-automated simulation. Abstrackr identified 33-90% of records missed by a single reviewer. RobotAnalyst performed less well and DistillerSR provided no relative advantage. User experiences depended on user friendliness, qualities of the user interface, features and functions, trustworthiness, ease and speed of obtaining predictions, and practicality of the export file(s).
Conclusions: The workload savings afforded in the automated simulation came with increased risk of missing relevant records. Supplementing a single reviewer's decisions with relevance predictions (semi-automated simulation) sometimes reduced the proportion missed, but performance varied by tool and SR. Designing tools based on reviewers' self-identified preferences may improve their compatibility with present workflows.
Systematic review registration: Not applicable.
Keywords: Automation; Machine learning; Systematic reviews; Usability; User experience.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
Similar articles
-
Performance and Usability of Machine Learning for Screening in Systematic Reviews: A Comparative Evaluation of Three Tools [Internet].Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Nov. Report No.: 19(20)-EHC027-EF. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Nov. Report No.: 19(20)-EHC027-EF. PMID: 31790164 Free Books & Documents. Review.
-
Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool.Syst Rev. 2018 Mar 12;7(1):45. doi: 10.1186/s13643-018-0707-8. Syst Rev. 2018. PMID: 29530097 Free PMC article.
-
The semi-automation of title and abstract screening: a retrospective exploration of ways to leverage Abstrackr's relevance predictions in systematic and rapid reviews.BMC Med Res Methodol. 2020 Jun 3;20(1):139. doi: 10.1186/s12874-020-01031-w. BMC Med Res Methodol. 2020. PMID: 32493228 Free PMC article.
-
Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study.Syst Rev. 2019 Nov 15;8(1):277. doi: 10.1186/s13643-019-1221-3. Syst Rev. 2019. PMID: 31727159 Free PMC article.
-
Assessing the Accuracy of Machine-Assisted Abstract Screening With DistillerAI: A User Study [Internet].Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Nov. Report No.: 19(20)-EHC026-EF. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Nov. Report No.: 19(20)-EHC026-EF. PMID: 31804782 Free Books & Documents. Review.
Cited by
-
Semi-automated title-abstract screening using natural language processing and machine learning.Syst Rev. 2024 Nov 1;13(1):274. doi: 10.1186/s13643-024-02688-w. Syst Rev. 2024. PMID: 39487499 Free PMC article.
-
An exploration of available methods and tools to improve the efficiency of systematic review production: a scoping review.BMC Med Res Methodol. 2024 Sep 18;24(1):210. doi: 10.1186/s12874-024-02320-4. BMC Med Res Methodol. 2024. PMID: 39294580 Free PMC article. Review.
-
Recommendations on the surveillance and supplementation of vitamins and minerals for upper gastrointestinal cancer survivors: a scoping review.J Cancer Surviv. 2024 Aug 29. doi: 10.1007/s11764-024-01666-4. Online ahead of print. J Cancer Surviv. 2024. PMID: 39207682 Review.
-
Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2. Cochrane Database Syst Rev. 2024. PMID: 39189465 Free PMC article.
-
Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed.Syst Rev. 2024 Jul 8;13(1):174. doi: 10.1186/s13643-024-02592-3. Syst Rev. 2024. PMID: 38978132 Free PMC article.
References
-
- Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synth Methods. 2011;2:1–14. 10.1002/jrsm.27. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
