Big Data, Evolution, and Metagenomes: Predicting Disease from Gut Microbiota Codon Usage Profiles

Methods Mol Biol. 2016:1415:509-31. doi: 10.1007/978-1-4939-3572-7_26.


Metagenomics projects use next-generation sequencing to unravel genetic potential in microbial communities from a wealth of environmental niches, including those associated with human body and relevant to human health. In order to understand large datasets collected in metagenomics surveys and interpret them in context of how a community metabolism as a whole adapts and interacts with the environment, it is necessary to extend beyond the conventional approaches of decomposing metagenomes into microbial species' constituents and performing analysis on separate components. By applying concepts of translational optimization through codon usage adaptation on entire metagenomic datasets, we demonstrate that a bias in codon usage present throughout the entire microbial community can be used as a powerful analytical tool to predict for community lifestyle-specific metabolism. Here we demonstrate this approach combined with machine learning, to classify human gut microbiome samples according to the pathological condition diagnosed in the human host.

Keywords: Cirrhosis; Enrichment analysis; Human metagenome; Random forests; Translational optimization; Variable selection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Codon
  • Data Mining
  • Evolution, Molecular
  • Gastrointestinal Microbiome
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Liver Cirrhosis / microbiology*
  • Machine Learning
  • Metagenomics / methods*
  • Phylogeny


  • Codon