Background: Although variation in the long-term course of major depressive disorder (MDD) is not strongly predicted by existing symptom subtype distinctions, recent research suggests that prediction can be improved by using machine learning methods. However, it is not known whether these distinctions can be refined by added information about co-morbid conditions. The current report presents results on this question.
Method: Data came from 8261 respondents with lifetime DSM-IV MDD in the World Health Organization (WHO) World Mental Health (WMH) Surveys. Outcomes included four retrospectively reported measures of persistence/severity of course (years in episode; years in chronic episodes; hospitalization for MDD; disability due to MDD). Machine learning methods (regression tree analysis; lasso, ridge and elastic net penalized regression) followed by k-means cluster analysis were used to augment previously detected subtypes with information about prior co-morbidity to predict these outcomes.
Results: Predicted values were strongly correlated across outcomes. Cluster analysis of predicted values found three clusters with consistently high, intermediate or low values. The high-risk cluster (32.4% of cases) accounted for 56.6-72.9% of high persistence, high chronicity, hospitalization and disability. This high-risk cluster had both higher sensitivity and likelihood ratio positive (LR+; relative proportions of cases in the high-risk cluster versus other clusters having the adverse outcomes) than in a parallel analysis that excluded measures of co-morbidity as predictors.
Conclusions: Although the results using the retrospective data reported here suggest that useful MDD subtyping distinctions can be made with machine learning and clustering across multiple indicators of illness persistence/severity, replication with prospective data is needed to confirm this preliminary conclusion.