A Machine Learning Approach for Detecting Digital Behavioral Patterns of Depression Using Nonintrusive Smartphone Data (Complementary Path to Patient Health Questionnaire-9 Assessment): Prospective Observational Study

JMIR Form Res. 2022 May 16;6(5):e37736. doi: 10.2196/37736.


Background: Depression is a major global cause of morbidity, an economic burden, and the greatest health challenge leading to chronic disability. Mobile monitoring of mental conditions has long been a sought-after metric to overcome the problems associated with the screening, diagnosis, and monitoring of depression and its heterogeneous presentation. The widespread availability of smartphones has made it possible to use their data to generate digital behavioral models that can be used for both clinical and remote screening and monitoring purposes. This study is novel as it adds to the field by conducting a trial using private and nonintrusive sensors that can help detect and monitor depression in a continuous, passive manner.

Objective: This study demonstrates a novel mental behavioral profiling metric (the Mental Health Similarity Score), derived from analyzing passively monitored, private, and nonintrusive smartphone use data, to identify and track depressive behavior and its progression.

Methods: Smartphone data sets and self-reported Patient Health Questionnaire-9 (PHQ-9) depression assessments were collected from 558 smartphone users on the Android operating system in an observational study over an average of 10.7 (SD 23.7) days. We quantified 37 digital behavioral markers from the passive smartphone data set and explored the relationship between the digital behavioral markers and depression using correlation coefficients and random forest models. We leveraged 4 supervised machine learning classification algorithms to predict depression and its severity using PHQ-9 scores as the ground truth. We also quantified an additional 3 digital markers from gyroscope sensors and explored their feasibility in improving the model's accuracy in detecting depression.

Results: The PHQ-9 2-class model (none vs severe) achieved the following metrics: precision of 85% to 89%, recall of 85% to 89%, F1 of 87%, and accuracy of 87%. The PHQ-9 3-class model (none vs mild vs severe) achieved the following metrics: precision of 74% to 86%, recall of 76% to 83%, F1 of 75% to 84%, and accuracy of 78%. A significant positive Pearson correlation was found between PHQ-9 questions 2, 6, and 9 within the severely depressed users and the mental behavioral profiling metric (r=0.73). The PHQ-9 question-specific model achieved the following metrics: precision of 76% to 80%, recall of 75% to 81%, F1 of 78% to 89%, and accuracy of 78%. When a gyroscope sensor was added as a feature, the Pearson correlation among questions 2, 6, and 9 decreased from 0.73 to 0.46. The PHQ-9 2-class model+gyro features achieved the following metrics: precision of 74% to 78%, recall of 67% to 83%, F1 of 72% to 78%, and accuracy of 76%.

Conclusions: Our results demonstrate that the Mental Health Similarity Score can be used to identify and track depressive behavior and its progression with high accuracy.

Keywords: depression; digital mental health; digital phenotyping; mobile phone.