Using random forest to detect multiple inherited metabolic diseases simultaneously based on GC-MS urinary metabolomics

Talanta. 2021 Dec 1:235:122720. doi: 10.1016/j.talanta.2021.122720. Epub 2021 Jul 19.

Abstract

Inborn errors of metabolism, also known as inherited metabolic diseases (IMDs), are related to genetic mutations and cause corresponding biochemical metabolic disorder of newborns and even sudden infant death. Timely detection and diagnosis of IMDs are of great significance for improving survival of newborns. Here we propose a strategy for simultaneously detecting six types of IMDs via combining GC-MS technique with the random forest algorithm (RF). Clinical urine samples from IMD and healthy patients are analyzed using GC-MS for acquiring metabolomics data. Then, the RF model is established as a multi-classification tool for the GC-MS data. Compared with the models built by artificial neural network and support vector machine, the results demonstrated the RF model has superior performance of high specificity, sensitivity, precision, accuracy, and matthews correlation coefficients on identifying all six types of IMDs and normal samples. The proposed strategy can afford a useful method for reliable and effective identification of multiple IMDs in clinical diagnosis.

Keywords: GC-MS; Inherited metabolic diseases; Metabolomics analysis; Multi-classification; Random forest.

MeSH terms

  • Algorithms
  • Gas Chromatography-Mass Spectrometry
  • Humans
  • Infant
  • Infant, Newborn
  • Metabolic Diseases*
  • Metabolomics