Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts

Chih-Hsuan Wei; Bethany R Harris; Donghui Li; Tanya Z Berardini; Eva Huala; Hung-Yu Kao; Zhiyong Lu

doi:10.1093/database/bas041

Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts

Database (Oxford). 2012 Nov 17:2012:bas041. doi: 10.1093/database/bas041. Print 2012.

Authors

Chih-Hsuan Wei¹, Bethany R Harris, Donghui Li, Tanya Z Berardini, Eva Huala, Hung-Yu Kao, Zhiyong Lu

Affiliation

¹ National Center for Biotechnology Information-NCBI, National Library of Medicine-NLM, 8600 Rockville Pike, Bethesda, MD 20894, USA.

Abstract

Today's biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assisted curation can improve efficiency, but few text-mining systems have been formally evaluated in this regard. Through participation in the interactive text-mining track of the BioCreative 2012 workshop, we developed PubTator, a PubMed-like system that assists with two specific human curation tasks: document triage and bioconcept annotation. On the basis of evaluation results from two external user groups, we find that the accuracy of PubTator-assisted curation is comparable with that of manual curation and that PubTator can significantly increase human curatorial speed. These encouraging findings warrant further investigation with a larger number of publications to be annotated. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

Publication types

Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Abstracting and Indexing*
Data Mining / methods*
Databases, Factual*
Feedback
Genes*
Humans
Periodicals as Topic*
PubMed*
User-Computer Interface*

Grants and funding

Intramural NIH HHS/United States