Analysis and evaluation of different sequencing depths from 5 to 20 million reads in shotgun metagenomic sequencing, with optimal minimum depth being recommended

Genome. 2022 Sep 1;65(9):491-504. doi: 10.1139/gen-2021-0120. Epub 2022 Sep 6.

Abstract

Our study was to analyze and evaluate the impact of different shotgun metagenomic sequencing depths from 5 to 20 million in metagenome-wide association studies (MWASs), and to determine the optimal minimum sequencing depth. We included a set of 200 previously published gut microbial shotgun metagenomic sequencing data on obesity (100 obese vs. 100 non-obese). The reads with original sequencing depths >20 million were downsized into seven experimental groups with depths from 5 to 20 million (interval 2.5 million). Using both integrated gene cluster (IGC) and metagenomic phylogenetic analysis 2 (MetaPhlAn2), we obtained and analyzed the read matching rates, gene count, species richness and abundance, diversity, and clinical biomarkers of the experimental groups with the original depth as the control group. An additional set of 100 published data from a colorectal cancer (CRC) study was included for validation (50 CRC vs. 50 CRC-free). Our results showed that more genes and species were identified following the increase in sequencing depths. When it reached 15 million or higher, the species richness became more stable with changing rate of 5% or lower, and the species composition more stable with ICC intraclass correlation coefficient (ICC) higher than 0.75. In terms of species abundance, 81% and 97% of species showed significant differences in IGC and MetaPhlAn2 among all groups with p < 0.05. Diversity showed significant differences across all groups, with decreasing differences of diversity between the experimental and the control groups following the increase in sequencing depth. The area under a receiver operating characteristic curve, AUC, of the obesity classifier for running the obesity testing samples showed an increasing trend following the increase in sequencing depth (τ = 0.29). The validation results were consistent with the above results. Our study found that the higher the sequencing depth is, the more the microbial information in structure and composition it provides. We also found that when sequencing depth was 15 million or higher, we obtained more stable species compositions and disease classifiers with good performance. Therefore, we recommend 15 million as the optimal minimum sequencing depth for an MWAS.

Keywords: analyse d’association métagénomique; classificateur de maladie; disease classifier; metagenome-wide association study; minimum sequencing depth; profondeur de séquençage; profondeur minimale de séquençage; sequencing depth.

MeSH terms

  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Metagenome*
  • Metagenomics* / methods
  • Obesity / genetics
  • Phylogeny