Detecting depression of Chinese microblog users via text analysis: Combining Linguistic Inquiry Word Count (LIWC) with culture and suicide related lexicons

Front Psychiatry. 2023 Feb 9:14:1121583. doi: 10.3389/fpsyt.2023.1121583. eCollection 2023.

Abstract

Introduction: In recent years, research has used psycholinguistic features in public discourse, networking behaviors on social media and profile information to train models for depression detection. However, the most widely adopted approach for the extraction of psycholinguistic features is to use the Linguistic Inquiry Word Count (LIWC) dictionary and various affective lexicons. Other features related to cultural factors and suicide risk have not been explored. Moreover, the use of social networking behavioral features and profile features would limit the generalizability of the model. Therefore, our study aimed at building a prediction model of depression for text-only social media data through a wider range of possible linguistic features related to depression, and illuminate the relationship between linguistic expression and depression.

Methods: We collected 789 users' depression scores as well as their past posts on Weibo, and extracted a total of 117 lexical features via Simplified Chinese Linguistic Inquiry Word Count, Chinese Suicide Dictionary, Chinese Version of Moral Foundations Dictionary, Chinese Version of Moral Motivation Dictionary, and Chinese Individualism/Collectivism Dictionary.

Results: Results showed that all the dictionaries contributed to the prediction. The best performing model occurred with linear regression, with the Pearson correlation coefficient between predicted values and self-reported values was 0.33, the R-squared was 0.10, and the split-half reliability was 0.75.

Discussion: This study did not only develop a predictive model applicable to text-only social media data, but also demonstrated the importance taking cultural psychological factors and suicide related expressions into consideration in the calculation of word frequency. Our research provided a more comprehensive understanding of how lexicons related to cultural psychology and suicide risk were associated with depression, and could contribute to the recognition of depression.

Keywords: CES-D; depression; machine learning; microblogging; prediction; text mining.

Grants and funding

This work was financially supported by the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDC02060300), the Scientific Foundation of Institute of Psychology, Chinese Academy of Sciences (No. E2CX4735YZ), Youth Innovation Promotion Association CAS, and the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDA27000000).