Exploratory data mining analysis identifying subgroups of patients with depression who are at high risk for suicide

J Clin Psychiatry. 2009 Nov;70(11):1495-500. doi: 10.4088/JCP.08m04795.


Objective: Although prior research has identified a number of separate risk factors for suicide among patients with depression, little is known about how these factors may interact to modify suicide risk. Using an empirically based decision tree analysis for a large national sample of Veterans Affairs (VA) health system patients treated for depression, we identified subgroups with particularly high or low rates of suicide.

Method: We identified 887,859 VA patients treated for depression between April 1, 1999, and September 30, 2004. Randomly splitting the data into 2 samples (primary and replication samples), we developed a decision tree for the primary sample using recursive partitioning. We then tested whether the groups developed within the primary sample were associated with increased suicide risk in the replication sample.

Results: The exploratory data analysis produced a decision tree with subgroups of patients at differing levels of risk for suicide. These were identified by a combination of factors including a co-occurring substance use disorder diagnosis, male sex, African American race, and psychiatric hospitalization in the past year. The groups developed as part of the decision tree accurately discriminated between those with and without suicide in the replication sample. The patients at highest risk for suicide were those with a substance use disorder who were non-African American and had an inpatient psychiatric stay within the past 12 months.

Conclusions: Study findings suggest that the identification of depressed patients at increased risk for suicide is improved through the examination of higher order interactions between potential risk factors.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cause of Death
  • Cohort Studies
  • Comorbidity
  • Cross-Sectional Studies
  • Data Mining / methods
  • Data Mining / statistics & numerical data*
  • Databases, Factual / statistics & numerical data
  • Decision Trees
  • Depressive Disorder / classification
  • Depressive Disorder / diagnosis
  • Depressive Disorder / epidemiology*
  • Diagnosis, Dual (Psychiatry)
  • Female
  • Humans
  • International Classification of Diseases / statistics & numerical data
  • Male
  • Middle Aged
  • Reproducibility of Results
  • Risk Factors
  • Substance-Related Disorders / diagnosis
  • Substance-Related Disorders / epidemiology
  • Suicide / psychology
  • Suicide / statistics & numerical data*
  • Suicide, Attempted / psychology
  • Suicide, Attempted / statistics & numerical data
  • United States / epidemiology
  • Veterans / psychology