Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers

Tanja Bekhuis; Dina Demner-Fushman

doi:10.1016/j.artmed.2012.05.002

Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers

Artif Intell Med. 2012 Jul;55(3):197-207. doi: 10.1016/j.artmed.2012.05.002. Epub 2012 Jun 5.

Authors

Tanja Bekhuis¹, Dina Demner-Fushman

Affiliation

¹ Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15232, USA. tcb24@pitt.edu

Abstract

Objectives: To investigate whether (1) machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers; (2) classifier performance varies with optimization; and (3) the number of citations to screen can be reduced.

Methods: We used an open-source, data-mining suite to process and classify biomedical citations that point to mostly nonrandomized studies from 2 systematic reviews. We built training and test sets for citation portions and compared classifier performance by considering the value of indexing, various feature sets, and optimization. We conducted our experiments in 2 phases. The design of phase I with no optimization was: 4 classifiers × 3 feature sets × 3 citation portions. Classifiers included k-nearest neighbor, naïve Bayes, complement naïve Bayes, and evolutionary support vector machine. Feature sets included bag of words, and 2- and 3-term n-grams. Citation portions included titles, titles and abstracts, and full citations with metadata. Phase II with optimization involved a subset of the classifiers, as well as features extracted from full citations, and full citations with overweighted titles. We optimized features and classifier parameters by manually setting information gain thresholds outside of a process for iterative grid optimization with 10-fold cross-validations. We independently tested models on data reserved for that purpose and statistically compared classifier performance on 2 types of feature sets. We estimated the number of citations needed to screen by reviewers during a second pass through a reduced set of citations.

Results: In phase I, the evolutionary support vector machine returned the best recall for bag of words extracted from full citations; the best classifier with respect to overall performance was k-nearest neighbor. No classifier attained good enough recall for this task without optimization. In phase II, we boosted performance with optimization for evolutionary support vector machine and complement naïve Bayes classifiers. Generalization performance was better for the latter in the independent tests. For evolutionary support vector machine and complement naïve Bayes classifiers, the initial retrieval set was reduced by 46% and 35%, respectively.

Conclusions: Machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers. Optimization can markedly improve performance of classifiers. However, generalizability varies with the classifier. The number of citations to screen during a second independent pass through the citations can be substantially reduced.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Artificial Intelligence*
Bayes Theorem
Biomedical Research / classification*
Data Mining / methods*
Humans
Medical Informatics
Review Literature as Topic*
Support Vector Machine*

Abstract

Publication types

MeSH terms

Grants and funding