The impact of methodological choices when developing predictive models using urinary metabolite data

Stat Med. 2022 Aug 15;41(18):3511-3526. doi: 10.1002/sim.9431. Epub 2022 May 14.


The continuous evolution of metabolomics over the past two decades has stimulated the search for metabolic biomarkers of many diseases. Metabolomic data measured from urinary samples can provide rich information of the biological events triggered by organ rejection in pediatric kidney transplant recipients. With additional validation, metabolic markers can be used to build clinically useful diagnostic tools. However, there are many methodological steps ranging from data processing to modeling that can influence the performance of the resulting metabolomic classifiers. In this study we focus on the comparison of various classification methods that can handle the complex structure of metabolomic data, including regularized classifiers, partial least squares discriminant analysis, and nonlinear classification models. We also examine the effectiveness of a physiological normalization technique widely used in the clinical and biochemical literature but not extensively analyzed and compared in urine metabolomic studies. While the main objective of this work is to interrogate metabolomic data of pediatric kidney transplant recipients to improve the diagnosis of T cell-mediated rejection (TCMR), we also analyze three independent datasets from other disease conditions to investigate the generalizability of our findings.

Keywords: T cell-mediated rejection; machine learning; predictive modeling; sample quality; urinary metabolites.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers / urine
  • Child
  • Discriminant Analysis
  • Humans
  • Kidney Transplantation*
  • Least-Squares Analysis
  • Metabolomics / methods


  • Biomarkers