Machine learning for biomedical literature triage

PLoS One. 2014 Dec 31;9(12):e115892. doi: 10.1371/journal.pone.0115892. eCollection 2014.


This paper presents a machine learning system for supporting the first task of the biological literature manual curation process, called triage. We compare the performance of various classification models, by experimenting with dataset sampling factors and a set of features, as well as three different machine learning algorithms (Naive Bayes, Support Vector Machine and Logistic Model Trees). The results show that the most fitting model to handle the imbalanced datasets of the triage classification task is obtained by using domain relevant features, an under-sampling technique, and the Logistic Model Trees algorithm.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Databases, Bibliographic*
  • Decision Trees
  • Medical Informatics / methods*
  • Models, Theoretical
  • Support Vector Machine*

Grants and funding

This work was supported by funding from Genome Canada (, Genome Quebec (, and Genome Alberta (, to AT. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.