Methodological and Quality Flaws in the Use of Artificial Intelligence in Mental Health Research: Systematic Review

JMIR Ment Health. 2023 Feb 2:10:e42045. doi: 10.2196/42045.


Background: Artificial intelligence (AI) is giving rise to a revolution in medicine and health care. Mental health conditions are highly prevalent in many countries, and the COVID-19 pandemic has increased the risk of further erosion of the mental well-being in the population. Therefore, it is relevant to assess the current status of the application of AI toward mental health research to inform about trends, gaps, opportunities, and challenges.

Objective: This study aims to perform a systematic overview of AI applications in mental health in terms of methodologies, data, outcomes, performance, and quality.

Methods: A systematic search in PubMed, Scopus, IEEE Xplore, and Cochrane databases was conducted to collect records of use cases of AI for mental health disorder studies from January 2016 to November 2021. Records were screened for eligibility if they were a practical implementation of AI in clinical trials involving mental health conditions. Records of AI study cases were evaluated and categorized by the International Classification of Diseases 11th Revision (ICD-11). Data related to trial settings, collection methodology, features, outcomes, and model development and evaluation were extracted following the CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) guideline. Further, evaluation of risk of bias is provided.

Results: A total of 429 nonduplicated records were retrieved from the databases and 129 were included for a full assessment-18 of which were manually added. The distribution of AI applications in mental health was found unbalanced between ICD-11 mental health categories. Predominant categories were Depressive disorders (n=70) and Schizophrenia or other primary psychotic disorders (n=26). Most interventions were based on randomized controlled trials (n=62), followed by prospective cohorts (n=24) among observational studies. AI was typically applied to evaluate quality of treatments (n=44) or stratify patients into subgroups and clusters (n=31). Models usually applied a combination of questionnaires and scales to assess symptom severity using electronic health records (n=49) as well as medical images (n=33). Quality assessment revealed important flaws in the process of AI application and data preprocessing pipelines. One-third of the studies (n=56) did not report any preprocessing or data preparation. One-fifth of the models were developed by comparing several methods (n=35) without assessing their suitability in advance and a small proportion reported external validation (n=21). Only 1 paper reported a second assessment of a previous AI model. Risk of bias and transparent reporting yielded low scores due to a poor reporting of the strategy for adjusting hyperparameters, coefficients, and the explainability of the models. International collaboration was anecdotal (n=17) and data and developed models mostly remained private (n=126).

Conclusions: These significant shortcomings, alongside the lack of information to ensure reproducibility and transparency, are indicative of the challenges that AI in mental health needs to face before contributing to a solid base for knowledge generation and for being a support tool in mental health management.

Keywords: artificial intelligence; health research; mental health; research methodology; research quality; review methodology; systematic review; trial methodology.

Publication types

  • Review