Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks

Luka Gligic; Andrey Kormilitzin; Paul Goldberg; Alejo Nevado-Holgado

doi:10.1016/j.neunet.2019.08.032

Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks

Neural Netw. 2020 Jan:121:132-139. doi: 10.1016/j.neunet.2019.08.032. Epub 2019 Sep 6.

Authors

Luka Gligic¹, Andrey Kormilitzin², Paul Goldberg³, Alejo Nevado-Holgado⁴

Affiliations

¹ University of Oxford, United Kingdom of Great Britain and Northern Ireland. Electronic address: luka.gligic@gtc.ox.ac.uk.
² University of Oxford, United Kingdom of Great Britain and Northern Ireland. Electronic address: andrey.kormilitzin@psych.ox.ac.uk.
³ University of Oxford, United Kingdom of Great Britain and Northern Ireland. Electronic address: paul.goldberg@cs.ox.ac.uk.
⁴ University of Oxford, United Kingdom of Great Britain and Northern Ireland. Electronic address: alejo.nevado-holgado@psych.ox.ac.uk.

PMID: 31541881
DOI: 10.1016/j.neunet.2019.08.032

Abstract

Neural networks (NNs) have become the state of the art in many machine learning applications, such as image, sound (LeCun et al., 2015) and natural language processing (Young et al., 2017; Linggard et al., 2012). However, the success of NNs remains dependent on the availability of large labelled datasets, such as in the case of electronic health records (EHRs). With scarce data, NNs are unlikely to be able to extract this hidden information with practical accuracy. In this study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge (Uzuner et al., 2010), 4.3 above the architecture that won the competition. To achieve this, we bootstrap our NN models through transfer learning by pretraining word embeddings on a secondary task performed on a large pool of unannotated EHRs and using the output embeddings as a foundation of a range of NN architectures. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms using attention-based seq2seq models bootstrapped in the same manner.

Keywords: Electronic health records; LSTM; NLP; Named entity recognition; Neural Networks; Transfer learning.

MeSH terms

Data Collection / classification
Data Collection / methods
Electronic Health Records / classification*
Humans
Machine Learning / classification*
Natural Language Processing*
Neural Networks, Computer*

Grants and funding

MC_PC_17215/MRC_/Medical Research Council/United Kingdom