From linguistic analyses to large language models: A scoping review of methods used to investigate language features in depression research

Psychiatry Res. 2026 Jun:360:117064. doi: 10.1016/j.psychres.2026.117064. Epub 2026 Mar 3.

Abstract

Clinicians have long relied on communication to identify mental states, as many mental health conditions manifest in language. This scoping review examined the literature to understand and contextualize how studies have conceptualized, identified and analyzed linguistic features-specific attributes of language reflecting psychological dimensions-for depression diagnosis and assessment. From 27,967 records, 180 studies were included. Results reveal significant heterogeneity in methods and techniques used to link linguistic features with depression. Early studies (1979-2004) focused on traditional analyses of structural and semantic features; mid-period studies (2004-2015) adopted lexicon-based tools like LIWC; and recent studies (2015-2024) increasingly use natural language processing (NLP) and large language models, including word and sentence embeddings. A small subset (n = 7) explored generative AI models (e.g., GPT), indicating an emerging direction. This methodological trajectory reflects a shift from descriptive characterization of language patterns to predictive modeling of depression status through advanced computational modeling of language. However, 64.2 % of samples were in English, and most studies lacked adequate reporting of demographic characteristics, limiting generalizability. Only 20.5 % used gold-standard diagnostic tools, while most relied on self-reported scales with inconsistent cut-off scores. Our findings emphasize the need for more robust linguistic analysis models that balance computational power with clinical interpretability, providing actionable insights that bridge linguistic research and practical mental health applications. Addressing methodological inconsistencies, enhancing sample characterization, and expanding research to diverse populations and languages are critical steps toward improving the reliability and relevance of linguistic markers in depression. This review underscores the importance of interdisciplinary approaches that combine cutting-edge technology with sound clinical practice to advance understanding and support evidence-based decision-making in mental health care.

Keywords: Computational linguistics; Depression; Linguistic markers; Natural language processing.

Publication types

  • Scoping Review
  • Review

MeSH terms

  • Depression* / diagnosis
  • Depressive Disorder* / diagnosis
  • Humans
  • Language*
  • Large Language Models
  • Linguistics* / methods
  • Natural Language Processing*