Prediction of depression cases, incidence, and chronicity in a large occupational cohort using machine learning techniques: an analysis of the ELSA-Brasil study

Psychol Med. 2021 Dec;51(16):2895-2903. doi: 10.1017/S0033291720001579. Epub 2020 Jun 4.



Background: Depression is highly prevalent and marked by a chronic and recurrent course. Despite being a major cause of disability worldwide, little is known regarding the determinants of its heterogeneous course. Machine learning techniques present an opportunity to develop tools to predict diagnosis and prognosis at an individual level.

Methods: We examined baseline (2008-2010) and follow-up (2012-2014) data of the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil), a large occupational cohort study. We implemented an elastic net regularization analysis with a 10-fold cross-validation procedure using socioeconomic and clinical factors as predictors to distinguish at follow-up: (1) depressed from non-depressed participants, (2) participants with incident depression from those who did not develop depression, and (3) participants with chronic (persistent or recurrent) depression from those without depression.

Results: We assessed 15 105 and 13 922 participants at waves 1 and 2, respectively. The elastic net regularization model distinguished outcome levels in the test dataset with an area under the curve of 0.79 (95% CI 0.76-0.82), 0.71 (95% CI 0.66-0.77), 0.90 (95% CI 0.86-0.95) for analyses 1, 2, and 3, respectively.

Conclusions: Diagnosis and prognosis related to depression can be predicted at an individual subject level by integrating low-cost variables, such as demographic and clinical data. Future studies should assess longer follow-up periods and combine biological predictors, such as genetics and blood biomarkers, to build more accurate tools to predict depression course.

Keywords: Incident depression; machine learning; major depressive disorder; prognosis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Brazil / epidemiology
  • Cohort Studies
  • Depression* / diagnosis
  • Depression* / epidemiology
  • Humans
  • Incidence
  • Longitudinal Studies
  • Machine Learning*