Assessment of machine learning algorithms in national data to classify the risk of self-harm among young adults in hospital: A retrospective study

Anmol Arora; Louis Bojko; Santosh Kumar; Joseph Lillington; Sukhmeet Panesar; Bruno Petrungaro

doi:10.1016/j.ijmedinf.2023.105164

Assessment of machine learning algorithms in national data to classify the risk of self-harm among young adults in hospital: A retrospective study

Int J Med Inform. 2023 Sep:177:105164. doi: 10.1016/j.ijmedinf.2023.105164. Epub 2023 Jul 25.

Authors

Anmol Arora¹, Louis Bojko², Santosh Kumar², Joseph Lillington², Sukhmeet Panesar³, Bruno Petrungaro²

Affiliations

¹ School of Clinical Medicine, University of Cambridge, Cambridge, UK; Health Economics Unit, NHS Midlands and Lancashire Commissioning Support Unit, Leyland, UK. Electronic address: anmol.arora@nhs.net.
² Health Economics Unit, NHS Midlands and Lancashire Commissioning Support Unit, Leyland, UK.
³ Senior Adviser, Office of Chief Data and Analytics Officer, NHS England and NHS Improvement, UK.

PMID: 37516036
DOI: 10.1016/j.ijmedinf.2023.105164

Abstract

Background: Self-harm is one of the most common presentations at accident and emergency departments in the UK and is a strong predictor of suicide risk. The UK Government has prioritised identifying risk factors and developing preventative strategies for self-harm. Machine learning offers a potential method to identify complex patterns with predictive value for the risk of self-harm.

Methods: National data in the UK Mental Health Services Data Set were isolated for patients aged 18-30 years who started a mental health hospital admission between Aug 1, 2020 and Aug 1, 2021, and had been discharged by Jan 1, 2022. Data were obtained on age group, gender, ethnicity, employment status, marital status, accommodation status and source of admission to hospital and used to construct seven machine learning models that were used individually and as an ensemble to predict hospital stays that would be associated with a risk of self-harm.

Outcomes: The training dataset included 23 808 items (including 1081 episodes of self-harm) and the testing dataset 5951 items (including 270 episodes of self-harm). The best performing algorithms were the random forest model (AUC-ROC 0.70, 95%CI:0.66-0.74) and the ensemble model (AUC-ROC 0.77 95%CI:0.75-0.79).

Interpretation: Machine learning algorithms could predict hospital stays with a high risk of self-harm based on readily available data that are routinely collected by health providers and recorded in the Mental Health Services Data Set. The findings should be validated externally with other real-world, prospective data.

Funding: This study was supported by the Midlands and Lancashire Commissioning Support Unit.

Keywords: Algorithmic bias; Artificial intelligence; Deep learning; Generalisability; Neural networks; Psychiatry; Risk stratification; Statistical models.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Hospitals
Humans
Machine Learning
Prospective Studies
Retrospective Studies
Risk Assessment
Self-Injurious Behavior* / diagnosis
Self-Injurious Behavior* / epidemiology
Self-Injurious Behavior* / psychology
Young Adult