Symptom-based patient stratification in mental illness using clinical notes

J Biomed Inform. 2019 Oct:98:103274. doi: 10.1016/j.jbi.2019.103274. Epub 2019 Sep 6.


Mental illnesses are highly heterogeneous with diagnoses based on symptoms that are generally qualitative, subjective, and documented in free text clinical notes rather than as structured data. Moreover, there exists significant variation in symptoms within diagnostic categories as well as substantial overlap in symptoms between diagnostic categories. These factors pose extra challenges for phenotyping patients with mental illness, a task that has proven challenging even for seemingly well characterized diseases. The ability to identify more homogeneous patient groups could both increase our ability to apply a precision medicine approach to psychiatric disorders and enable elucidation of underlying biological mechanism of pathology. We describe a novel approach to deep phenotyping in mental illness in which contextual term extraction is used to identify constellations of symptoms in a cohort of patients diagnosed with schizophrenia and related disorders. We applied topic modeling and dimensionality reduction to identify similar groups of patients and evaluate the resulting clusters through visualization and interrogation of clinically interpretable weighted features. Our findings show that patients diagnosed with schizophrenia may be meaningfully stratified using symptom-based clustering.

Keywords: Disease stratification; Natural language processing; Precision medicine; Schizophrenia; Symptoms.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Algorithms
  • Cluster Analysis
  • Electronic Health Records
  • Female
  • Humans
  • Male
  • Medical Informatics / methods*
  • Mental Disorders / diagnosis*
  • Mental Disorders / physiopathology
  • Middle Aged
  • Natural Language Processing
  • Phenotype
  • Precision Medicine / methods
  • Schizophrenia / diagnosis*
  • Schizophrenia / physiopathology
  • Stochastic Processes
  • Symptom Assessment / methods*