An Overview of BioCreative II.5

Florian Leitner; Scott A Mardis; Martin Krallinger; Gianni Cesareni; Lynette A Hirschman; Alfonso Valencia

doi:10.1109/tcbb.2010.61

An Overview of BioCreative II.5

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):385-99. doi: 10.1109/tcbb.2010.61.

Authors

Florian Leitner¹, Scott A Mardis, Martin Krallinger, Gianni Cesareni, Lynette A Hirschman, Alfonso Valencia

Affiliation

¹ Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre, Madrid, Spain. valencia@cnio.es

PMID: 20704011
DOI: 10.1109/tcbb.2010.61

Abstract

We present the results of the BioCreative II.5 evaluation in association with the FEBS Letters experiment, where authors created Structured Digital Abstracts to capture information about protein-protein interactions. The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems. The tasks were to rank articles for curation based on curatable protein-protein interactions; to identify the interacting proteins (using UniProt identifiers) in the positive articles (61); and to identify interacting protein pairs. There were 595 full-text articles in the evaluation test set, including those both with and without curatable protein interactions. The principal evaluation metrics were the interpolated area under the precision/recall curve (AUC iP/R), and (balanced) F-measure. For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58), after filtering incorrect species and mapping homonymous orthologs; for interacting protein pairs, the top (filtered, mapped) recall was 0.42 and AUC iP/R was 0.29. Ensemble systems improved performance for the interacting protein task.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Abstracting and Indexing*
Computational Biology / methods*
Data Collection / methods
Data Mining / methods*
Database Management Systems
Databases, Factual
Information Management / methods*
Natural Language Processing
Protein Interaction Mapping / classification*

Grants and funding

GGP09243/TI_/Telethon/Italy