Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 1;2018:bay073.
doi: 10.1093/database/bay073.

Extracting Chemical-Protein Relations With Ensembles of SVM and Deep Learning Models

Affiliations
Free PMC article

Extracting Chemical-Protein Relations With Ensembles of SVM and Deep Learning Models

Yifan Peng et al. Database (Oxford). .
Free PMC article

Abstract

Mining relations between chemicals and proteins from the biomedical literature is an increasingly important task. The CHEMPROT track at BioCreative VI aims to promote the development and evaluation of systems that can automatically detect the chemical-protein relations in running text (PubMed abstracts). This work describes our CHEMPROT track entry, which is an ensemble of three systems, including a support vector machine, a convolutional neural network, and a recurrent neural network. Their output is combined using majority voting or stacking for final predictions. Our CHEMPROT system obtained 0.7266 in precision and 0.5735 in recall for an F-score of 0.6410 during the challenge, demonstrating the effectiveness of machine learning-based approaches for automatic relation extraction from biomedical literature and achieving the highest performance in the task during the 2017 challenge.Database URL: http://www.biocreative.org/tasks/biocreative-vi/track-5/.

Figures

Figure 1.
Figure 1.
Chemical–protein annotation example.
Figure 2.
Figure 2.
Architecture of the systems for the CHEMPROT task.
Figure 3.
Figure 3.
Overview of the CNN model.
Figure 4.
Figure 4.
Overview of the RNN model.
Figure 5.
Figure 5.
Data partition of 5-fold cross-validation and final submission.
Figure 6.
Figure 6.
The distribution of pairs according to the number of approaches that can correctly classify a given pair.
Figure 7.
Figure 7.
Average sentence length and entity distance in words by the number of approaches that can correctly classify a given pair.
Figure 8.
Figure 8.
Average sentence length in words by the model that can correctly classify a given pair.
Figure 9.
Figure 9.
Average entity distance in words by the model that can correctly classify a given pair.

Similar articles

See all similar articles

Cited by 5 articles

References

    1. Krallinger M., Rabal O., Akhondi S.A. et al. (eds). (2017) Overview of the BioCreative VI Chemical-Protein Interaction Track In: Proceedings of the BioCreative VI Workshop, Bethesda, MD. pp. 141–146.
    1. Vapnik V. (1995) The Nature of Statistical Learning Theory: Springer Science & Business Media, New York, NY, USA.
    1. Miwa M., Sætre R., Miyao Y., Tsujii J. (eds). (2009) A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics, Singapore. pp. 121–130.
    1. Ching T., Himmelstein D.S., Beaulieu-Jones B.K. et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface, 15(141), 20170387. - PMC - PubMed
    1. Peng Y., Lu Z. (2017) Deep learning for extracting protein-protein interactions from biomedical literature. BioNLP 2017. Vancouver, Canada. pp. 29–38.

Publication types

Feedback