High Throughput Phenotyping for Dimensional Psychopathology in Electronic Health Records

Biol Psychiatry. 2018 Jun 15;83(12):997-1004. doi: 10.1016/j.biopsych.2018.01.011. Epub 2018 Feb 26.


Background: Relying on diagnostic categories of neuropsychiatric illness obscures the complexity of these disorders. Capturing multiple dimensional measures of neuropathology could facilitate the clinical and neurobiological investigation of cognitive and behavioral phenotypes.

Methods: We developed a natural language processing-based approach to extract five symptom dimensions, based on the National Institute of Mental Health Research Domain Criteria definitions, from narrative clinical notes. Estimates of Research Domain Criteria loading were derived from a cohort of 3619 individuals with 4623 hospital admissions. We applied this tool to a large corpus of psychiatric inpatient admission and discharge notes (2010-2015), and using the same cohort we examined face validity, predictive validity, and convergent validity with gold standard annotations.

Results: In mixed-effect models adjusted for sociodemographic and clinical features, greater negative and positive symptom domains were associated with a shorter length of stay (β = -.88, p = .001 and β = -1.22, p < .001, respectively), while greater social and arousal domain scores were associated with a longer length of stay (β = .93, p < .001 and β = .81, p = .007, respectively). In fully adjusted Cox regression models, a greater positive domain score at discharge was also associated with a significant increase in readmission risk (hazard ratio = 1.22, p < .001). Positive and negative valence domains were correlated with expert annotation (by analysis of variance [df = 3], R2 = .13 and .19, respectively). Likewise, in a subset of patients, neurocognitive testing was correlated with cognitive performance scores (p < .008 for three of six measures).

Conclusions: This shows that natural language processing can be used to efficiently and transparently score clinical notes in terms of cognitive and psychopathologic domains.

Keywords: Computed phenotype; Electronic health record; Natural language processing; Research Domain Criteria; Topic modeling; Transdiagnostic.

Publication types

  • Multicenter Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Cohort Studies
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Hospitalization
  • Humans
  • Male
  • Mental Disorders / diagnosis*
  • Mental Disorders / psychology*
  • Middle Aged
  • Natural Language Processing
  • Neuropsychological Tests
  • Phenotype
  • Psychiatric Status Rating Scales
  • Psychopathology*