Disease named entity recognition from biomedical literature using a novel convolutional neural network

Zhehuan Zhao; Zhihao Yang; Ling Luo; Lei Wang; Yin Zhang; Hongfei Lin; Jian Wang

doi:10.1186/s12920-017-0316-8

Disease named entity recognition from biomedical literature using a novel convolutional neural network

BMC Med Genomics. 2017 Dec 28;10(Suppl 5):73. doi: 10.1186/s12920-017-0316-8.

Authors

Zhehuan Zhao¹, Zhihao Yang², Ling Luo¹, Lei Wang³, Yin Zhang⁴, Hongfei Lin¹, Jian Wang¹

Affiliations

¹ College of Computer Science and Technology, Dalian University of Technology, Dalian, 116023, China.
² College of Computer Science and Technology, Dalian University of Technology, Dalian, 116023, China. yangzh@dlut.edu.cn.
³ Beijing Institute of Health Administration and Medical Information, Beijing, 100850, China. wangleibihami@gmail.com.
⁴ Beijing Institute of Health Administration and Medical Information, Beijing, 100850, China.

Abstract

Background: Automatic disease named entity recognition (DNER) is of utmost importance for development of more sophisticated BioNLP tools. However, most conventional CRF based DNER systems rely on well-designed features whose selection is labor intensive and time-consuming. Though most deep learning methods can solve NER problems with little feature engineering, they employ additional CRF layer to capture the correlation information between labels in neighborhoods which makes them much complicated.

Methods: In this paper, we propose a novel multiple label convolutional neural network (MCNN) based disease NER approach. In this approach, instead of the CRF layer, a multiple label strategy (MLS) first introduced by us, is employed. First, the character-level embedding, word-level embedding and lexicon feature embedding are concatenated. Then several convolutional layers are stacked over the concatenated embedding. Finally, MLS strategy is applied to the output layer to capture the correlation information between neighboring labels.

Results: As shown by the experimental results, MCNN can achieve the state-of-the-art performance on both NCBI and CDR corpora.

Conclusions: The proposed MCNN based disease NER method achieves the state-of-the-art performance with little feature engineering. And the experimental results show the MLS strategy's effectiveness of capturing the correlation information between labels in the neighborhood.

Keywords: Convolutional neural network; Deep learning multiple label strategy; Disease; Named entity recognition.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biomedical Research*
Data Mining / methods*
Disease*
Neural Networks, Computer*