A multi-stage large language model framework for extracting suicide-related social determinants of health

Commun Med (Lond). 2025 Sep 29;5(1):404. doi: 10.1038/s43856-025-01114-z.

Abstract

Background: Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability.

Methods: We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study.

Results: We show that our proposed framework demonstrates performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability.

Conclusions: Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.

Plain language summary

Social determinants of health (SDoH) are the circumstances in which people are born, grow, live, work, and age that can have an impact on their health and well-being. We aimed to improve how SDoH factors that contribute to suicide incidents are identified. We developed a computational large language model framework that can extract suicide-related SDoH factors from unstructured text. Our approach was evaluated against other advanced language models and found to perform better in extracting SDoH factors and explaining its decisions. By making the SDoH factor extraction process more accurate and transparent, this work can help experts more quickly recognize individuals at risk of suicide and support better prevention strategies.