Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 18:11:giac116.
doi: 10.1093/gigascience/giac116.

Improved microbial genomes and gene catalog of the chicken gut from metagenomic sequencing of high-fidelity long reads

Affiliations
Free PMC article

Improved microbial genomes and gene catalog of the chicken gut from metagenomic sequencing of high-fidelity long reads

Yan Zhang et al. Gigascience. .
Free PMC article

Abstract

Background: Due to the importance of chicken production and the remarkable influence of the gut microbiota on host health and growth, tens of thousands of metagenome-assembled genomes (MAGs) have been constructed for the chicken gut microbiome. However, due to the limitations of short-read sequencing and assembly technologies, most of these MAGs are far from complete, are of lower quality, and include contaminant reads.

Results: We generated 332 Gb of high-fidelity (HiFi) long reads from the 5 chicken intestinal compartments and assembled 461 and 337 microbial genomes, of which 53% and 55% are circular, at the species and strain levels, respectively. For the assembled microbial genomes, approximately 95% were regarded as complete according to the "RNA complete" criteria, which requires at least 1 full-length ribosomal RNA (rRNA) operon encoding all 3 types of rRNA (16S, 23S, and 5S) and at least 18 copies of full-length transfer RNA genes. In comparison with the short-read-derived chicken MAGs, 384 (83% of 461) and 89 (26% of 337) strain-level and species-level genomes in this study are novel, with no matches to previously reported sequences. At the gene level, one-third of the 2.5 million genes in the HiFi-derived gene catalog are novel and cannot be matched to the short-read-derived gene catalog. Moreover, the HiFi-derived genomes have much higher continuity and completeness, as well as lower contamination; the HiFi-derived gene catalog has a much higher ratio of complete gene structures. The dominant phylum in our HiFi-assembled genomes was Firmicutes (82.5%), and the foregut was highly enriched in 5 genera: Ligilactobacillus, Limosilactobacillus, Lactobacillus, Weissella, and Enterococcus, all of which belong to the order Lactobacillales. Using GTDB-Tk, all 337 species-level genomes were successfully classified at the order level; however, 2, 35, and 189 genomes could not be classified into any known family, genus, and species, respectively. Among these incompletely classified genomes, 9 and 49 may belong to novel genera and species, respectively, because their 16S rRNA genes have identities lower than 95% and 97% to any known 16S rRNA genes.

Conclusions: HiFi sequencing not only produced metagenome assemblies and gene structures with markedly improved quality but also recovered a substantial portion of novel genomes and genes that were missed in previous short-read-based metagenome studies. The novel genomes and species obtained in this study will facilitate gut microbiome and host-microbiota interaction studies, thereby contributing to the sustainable development of poultry resources.

Keywords: Chicken gut; Gene catalog; Metagenome-assembled genomes; PacBio HiFi sequencing.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Graphic display of the contig assembly graph. Random colors were chosen for different contigs. The line length is proportional to the contig length, and the line width is proportional to the contig coverage depth. Some examples for super complex, tangled circular, individual circular, and linear contigs are labeled. This plot shows the colorectum assembly drawn by Bandage.
Figure 2:
Figure 2:
Contig assembly statistics. (A) Histogram of total assembled contig sizes for each intestinal compartment. (B) Histogram of N50 contig sizes for each intestinal compartment. (C) Correlation plot of contig length and coverage depth, generated using contig data from all intestinal compartments. The red marker line indicates that sufficient coverage depth contributes to contig continuity. (D) Correlation plot of the YAK quality score (QV) and coverage depth, using contigs with lengths over 100 kb from all intestinal compartments. The k-mer frequency was calculated with the parameters “yak count -b37 -t48” and the yak QV was calculated with the parameters “yak qv -t80 -p -K3.2 g -l100k.” The red marker line indicates that a higher coverage depth improves the single-base quality of the contig sequences.
Figure 3:
Figure 3:
Evaluation and ranking of assembled microbial genomes. (A) “Circular genomes” refers to circular contigs, and “noncircular MAGs” refers to incomplete genome assemblies derived from contig binning or merging algorithms. A circular genome or noncircular MAG is defined as “near complete” if its CheckM completeness is ≥90% and its contamination level is ≤5%, defined as “high quality” if completeness is ≥70% and contamination is ≤10%, or defined as “medium quality” if completeness is ≥50% and contamination is ≤10%. Combined (NR) is the nonredundant set of microbial genomes from all intestinal compartments. All the microbial genomes in Combined (NR) have ≤99% average nucleotide identity to the other microbial genomes in Combined (NR). (B) Distribution of the assembled microbial genome sizes for circular genomes and noncircular MAGs. (C) Distribution of the CheckM scores (completeness – 5 * contamination) for circular genomes and noncircular MAGs.
Figure 4:
Figure 4:
Statistics of noncoding RNA genes in assembled microbial genomes. (A) Distribution of the number of full rRNA operons (i.e., those that encode 5S, 16S, and 23S rRNA). (B) Distribution of the number of tRNA genes.
Figure 5:
Figure 5:
Comparison of HiFi-assembled microbial genomes with short-read assembled MAGs. (A) Matching of our 461 assembled microbial strain-level genomes (99% average nucleotide identity, ANI) with 12,339 public dereplicated MAGs (99% ANI) derived from short reads. The HiFi-assembled microbial genome was considered a match if its ANI was higher than 99% for any short-read assembled MAG. (B) Matching of our 337 assembled microbial species-level genomes (95% ANI) with 1,978 public dereplicated MAGs (95% ANI) derived from short reads. The HiFi-assembled microbial genome was considered a match if its ANI was higher than 95% for any short-read assembled MAG. The unmatched microbial genomes unveil candidates of novel strains and species. (C) Number of genomes, (D) average contig number, (E) averaged assembled genome size, (F) average N50 contig size, (G) average CheckM completeness, and (H) average CheckM contamination of the circular genomes, noncircular MAGs, and public chicken gut MAGs assembled from short reads.
Figure 6:
Figure 6:
Comparison of HiFi-derived gene catalog (HiFi-RGC) with 2 short-read-derived gene catalogs (CGM-RGC and GG-IGC). CGM-RGC refers to chicken gut metagenome—reference gene catalog published by Huang et al. [9] in 2018. GG-IGC refers to Gallus gallus—Integrated gene catalog published by Feng et al. [8] in 2021. (A) Gene number and (B) gene structure completeness ratio of the 3 gene catalogs. Overlap of HiFi-RGC and CGM-RGC (C) and GG-IGC (D). A confident overlap is defined by the criteria of sequence identity ≥95% and length overlap ≥90% of the shorter sequence.
Figure 7:
Figure 7:
Phylogeny of the HiFi-assembled microbial genomes. Each colored clade corresponds to a phylum inferred by GTDB-Tk. Inside the largest phylum, Firmicutes, 5 genera (Ligilactobacillus, Limosilactobacillus, Lactobacillus, Weissella, and Enterococcus) are also colored for highlighting. The leaf nodes of the phylogenetic tree have 2 shapes: a solid circle represents a circular genome, and a hollow circle represents a noncircular MAG. The colors of the leaf nodes represent CheckM quality ranks: green represents near-complete assemblies, blue represents high-quality assemblies, and red represents medium-quality assemblies. The inner ring shows the GTDB-Tk classification, and a triangle indicates that the corresponding leaf node matches an existing genome in the GTDB database. The 5 outer rings show the sequencing coverage depth for each assembled microbial genome from each intestinal compartment. From inner to outer: duodenum, jejunum, ileum, cecum, and colorectum.

Similar articles

Cited by

References

    1. International Chicken Genome Sequencing Consortium . Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432(7018):695–716. - PubMed
    1. Wong GK, Liu B, Wang J, et al. . A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. Nature. 2004;432(7018):717–22. - PMC - PubMed
    1. Rubin CJ, Zody MC, Eriksson J, et al. . Whole-genome resequencing reveals loci under selection during chicken domestication. Nature. 2010;464(7288):587–91. - PubMed
    1. Yeoman CJ, Chia N, Jeraldo P, et al. . The microbiome of the chicken gastrointestinal tract. Anim Health Res Rev. 2012;13(1):89–99. - PubMed
    1. Oakley BB, Lillehoj HS, Kogut MH, et al. . The chicken gastrointestinal microbiome. FEMS Microbiol Lett. 2014;360(2):100–12. - PubMed

Publication types

Substances