Disease named entity recognition from biomedical literature using a novel convolutional neural network

BMC Med Genomics. 2017 Dec 28;10(Suppl 5):73. doi: 10.1186/s12920-017-0316-8.

Abstract

Background: Automatic disease named entity recognition (DNER) is of utmost importance for development of more sophisticated BioNLP tools. However, most conventional CRF based DNER systems rely on well-designed features whose selection is labor intensive and time-consuming. Though most deep learning methods can solve NER problems with little feature engineering, they employ additional CRF layer to capture the correlation information between labels in neighborhoods which makes them much complicated.

Methods: In this paper, we propose a novel multiple label convolutional neural network (MCNN) based disease NER approach. In this approach, instead of the CRF layer, a multiple label strategy (MLS) first introduced by us, is employed. First, the character-level embedding, word-level embedding and lexicon feature embedding are concatenated. Then several convolutional layers are stacked over the concatenated embedding. Finally, MLS strategy is applied to the output layer to capture the correlation information between neighboring labels.

Results: As shown by the experimental results, MCNN can achieve the state-of-the-art performance on both NCBI and CDR corpora.

Conclusions: The proposed MCNN based disease NER method achieves the state-of-the-art performance with little feature engineering. And the experimental results show the MLS strategy's effectiveness of capturing the correlation information between labels in the neighborhood.

Keywords: Convolutional neural network; Deep learning multiple label strategy; Disease; Named entity recognition.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomedical Research*
  • Data Mining / methods*
  • Disease*
  • Neural Networks, Computer*