Validating a natural language processing tool to exclude psychogenic nonepileptic seizures in electronic medical record-based epilepsy research

Epilepsy Behav. 2013 Dec;29(3):578-80. doi: 10.1016/j.yebeh.2013.09.025. Epub 2013 Oct 14.


Rationale: As electronic health record (EHR) systems become more available, they will serve as an important resource for collecting epidemiologic data in epilepsy research. However, since clinicians do not have a systematic method for coding psychogenic nonepileptic seizures (PNES), patients with PNES are often misclassified as having epilepsy, leading to sampling error. This study validates a natural language processing (NLP) tool that uses linguistic information to help identify patients with PNES.

Methods: Using the VA national clinical database, 2200 notes of Iraq and Afghanistan veterans who completed video electroencephalograph (VEEG) monitoring were reviewed manually, and the veterans were identified as having documented PNES or not. Reviewers identified PNES-related vocabulary to inform a NLP tool called Yale cTakes Extension (YTEX). Using NLP techniques, YTEX annotates syntactic constructs, named entities, and their negation context in the EHR. These annotations are passed to a classifier to detect patients without PNES. The classifier was evaluated by calculating positive predictive values (PPVs), sensitivity, and F-score.

Results: Of the 742 Iraq and Afghanistan veterans who received a diagnosis of epilepsy or seizure disorder by VEEG, 44 had documented events on VEEG: 22 veterans (3.0%) had definite PNES only, 20 (2.7%) had probable PNES, and 2 (0.3%) had both PNES and epilepsy documented. The remaining 698 veterans did not have events captured during the VEEG admission and/or did not have a definitive diagnosis. Our classifier achieved a PPV of 93%, a sensitivity of 99%, and a F-score of 96%.

Conclusion: Our study demonstrates that the YTEX NLP tool and classifier is highly accurate in excluding PNES, diagnosed with VEEG, in EHR systems. The tool may be very valuable in preventing false positive identification of patients with epilepsy in EHR-based epidemiologic research.

Keywords: Electronic health record; Epidemiology; Natural language processing; Psychogenic nonepileptic seizures.

MeSH terms

  • Afghan Campaign 2001-
  • Biomedical Research*
  • Electronic Health Records / statistics & numerical data*
  • Epilepsy* / diagnosis
  • Epilepsy* / epidemiology
  • Epilepsy* / therapy
  • Female
  • Humans
  • Iraq War, 2003-2011
  • Male
  • Natural Language Processing*
  • Reproducibility of Results
  • United States / epidemiology
  • United States Department of Veterans Affairs / statistics & numerical data