MB-SupCon: Microbiome-based Predictive Models via Supervised Contrastive Learning

J Mol Biol. 2022 Aug 15;434(15):167693. doi: 10.1016/j.jmb.2022.167693. Epub 2022 Jun 28.

Abstract

Human microbiome consists of trillions of microorganisms. Microbiota can modulate the host physiology through molecule and metabolite interactions. Integrating microbiome and metabolomics data have the potential to predict different diseases more accurately. Yet, most datasets only measure microbiome data but without paired metabolome data. Here, we propose a novel integrative modeling framework, Microbiome-based Supervised Contrastive Learning Framework (MB-SupCon). MB-SupCon integrates microbiome and metabolome data to generate microbiome embeddings, which can be used to improve the prediction accuracy in datasets that only measure microbiome data. As a proof of concept, we applied MB-SupCon on 720 samples with paired 16S microbiome data and metabolomics data from patients with type 2 diabetes. MB-SupCon outperformed existing prediction methods and achieved high average prediction accuracies for insulin resistance status (84.62%), sex (78.98%), and race (80.04%). Moreover, the microbiome embeddings form separable clusters for different covariate groups in the lower-dimensional space, which enhances data visualization. We also applied MB-SupCon on a large inflammatory bowel disease study and observed similar advantages. Thus, MB-SupCon could be broadly applicable to improve microbiome prediction models in multi-omics disease studies.

Keywords: Contrastive learning; Microbiome; Prediction model; Supervised learning.

MeSH terms

  • Diabetes Mellitus, Type 2 / genetics
  • Diabetes Mellitus, Type 2 / microbiology
  • Humans
  • Inflammatory Bowel Diseases / genetics
  • Inflammatory Bowel Diseases / microbiology
  • Metabolome*
  • Metabolomics / methods
  • Microbiota*
  • RNA, Ribosomal, 16S / genetics
  • Supervised Machine Learning*

Substances

  • RNA, Ribosomal, 16S