Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record

Artif Intell Med. Sep-Oct 2003;29(1-2):169-84. doi: 10.1016/s0933-3657(03)00052-6.


In this article, we show how a set of natural language processing (NLP) tools can be combined to improve the processing of clinical records. The study concentrates on improving spelling correction, which is of major importance for quality control in the electronic patient record (EPR). As first task, we report on the design of an improved interactive tool for correcting spelling errors. Unlike traditional systems, the linguistic context (both semantic and syntactic) is used to improve the correction strategy. The system is organized along three modules. Module 1 is based on a classical spelling checker, it means that it is context-independent and simply measures a string-edit-distance between a misspelled word and a list of well-formed words. Module 2 attempts to rank more relevantly the set of candidates provided by the first module using morpho-syntactic disambiguation tools. Module 3 processes words with the same part-of-speech (POS) and apply word-sense (WS) disambiguation in order to rerank the set of candidates. As second task, we show how this improved interactive spell checker can be cast as a fully automatic system by adjunction of another NLP module: a named-entity (NE) extractor, i.e. a tool able to identify words as such patient and physician names. This module is used to avoid replacement of named-entities when the system is not used in an interactive mode. Results confirm that using the linguistic context can improve interactive spelling correction, and justify the use of named-entity recognizer to conduct fully automatic spelling correction. It is concluded that NLP is mature enough to help information processing in EPR.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Humans
  • Language
  • Medical Records Systems, Computerized*
  • Names
  • Pattern Recognition, Automated*
  • Quality Control