Measuring metagenome diversity and similarity with Hill numbers

Mol Ecol Resour. 2018 Nov;18(6):1339-1355. doi: 10.1111/1755-0998.12923. Epub 2018 Jul 27.

Abstract

The first step of any metagenome sequencing project is to get the inventory of OTU abundances (operational taxonomic units) and/or metagenomic gene abundances. The former is generated with 16S-rRNA-tagged amplicon sequencing technology, and the latter can be generated from either gene-targeted or whole-sample shotgun metagenomics technologies. With 16S-rRNA data sets, measuring community diversity with diversity indexes such as species richness and Shannon's index has been a de facto standard analysis; nevertheless, similarly comprehensive approaches to metagenomic gene abundances are still largely missing, despite that both OTU and gene abundances are DNA reads. Here, we adapt the Hill numbers, which were reintroduced to macrocommunity ecology recently and are now widely regarded as a most appropriate measure system for ecological diversity, for measuring metagenome alpha-, beta- and gamma-diversities, and similarity. Our proposal includes the following: (a) Metagenomic gene (MG) diversity measures the single-gene-level metagenome diversity; (b) Type-I metagenome functional gene cluster (MFGC) diversity measures the diversity of functional gene clusters but ignoring within-cluster gene abundance information; (c) Type-II MFGC diversity considers within-cluster gene abundances information and integrates gene-cluster-level metagenome diversity and functional gene redundancy information; and (d) Four classes of Hill-numbers-based similarity metrics, including local gene overlap, regional gene overlap, gene homogeneity measure and gene turnover complement, were introduced in terms of MG and MFGC, respectively. We demonstrate the proposal with the gut metagenomes from healthy and IBD (inflammatory bowel disease) cohorts. The Hill numbers offer a unified approach to cohesively and comprehensively measuring the ecological and metagenome diversities of microbiomes.

Keywords: Hill numbers; medical ecology; metagenome diversity; metagenome functional gene cluster diversity; metagenome similarity; metagenomic gene diversity.

MeSH terms

  • Cluster Analysis
  • Computational Biology / methods*
  • DNA, Ribosomal / chemistry
  • DNA, Ribosomal / genetics
  • Genetic Variation*
  • Metagenome*
  • Metagenomics / methods*
  • Phylogeny*
  • RNA, Ribosomal, 16S / genetics
  • Sequence Analysis, DNA

Substances

  • DNA, Ribosomal
  • RNA, Ribosomal, 16S