Overview of the BioCreative VI text-mining services for Kinome Curation Track

Julien Gobeill; Pascale Gaudet; Daniel Dopp; Adam Morrone; Indika Kahanda; Yi-Yu Hsu; Chih-Hsuan Wei; Zhiyong Lu; Patrick Ruch

doi:10.1093/database/bay104

Overview of the BioCreative VI text-mining services for Kinome Curation Track

Database (Oxford). 2018 Jan 1:2018:bay104. doi: 10.1093/database/bay104.

Authors

Julien Gobeill^{1

2}, Pascale Gaudet¹, Daniel Dopp³, Adam Morrone⁴, Indika Kahanda⁵, Yi-Yu Hsu⁶, Chih-Hsuan Wei⁶, Zhiyong Lu⁶, Patrick Ruch^{1

2}

Affiliations

¹ SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland.
² HES-SO / HEG Geneva, Information Sciences, Geneva, Switzerland.
³ University of Kentucky, Lexington, KY, USA.
⁴ Liberty University, Lynchburg, VA, USA.
⁵ Montana State University, Bozeman, MT, USA.
⁶ National Center for Biotechnology Information, Bethesda, MD, USA.

Abstract

The text-mining services for kinome curation track, part of BioCreative VI, proposed a competition to assess the effectiveness of text mining to perform literature triage. The track has exploited an unpublished curated data set from the neXtProt database. This data set contained comprehensive annotations for 300 human protein kinases. For a given protein and a given curation axis [diseases or gene ontology (GO) biological processes], participants' systems had to identify and rank relevant articles in a collection of 5.2 M MEDLINE citations (task 1) or 530 000 full-text articles (task 2). Explored strategies comprised named-entity recognition and machine-learning frameworks. For that latter approach, participants developed methods to derive a set of negative instances, as the databases typically do not store articles that were judged as irrelevant by curators. The supervised approaches proposed by the participating groups achieved significant improvements compared to the baseline established in a previous study and compared to a basic PubMed search.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Mining*
Databases, Factual
Humans
Periodicals as Topic
Protein Kinases / metabolism*

Substances

Protein Kinases