Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 13;29(1):94-106.e4.
doi: 10.1016/j.chom.2020.10.010. Epub 2020 Nov 19.

Identification of Natural CRISPR Systems and Targets in the Human Microbiome

Affiliations

Identification of Natural CRISPR Systems and Targets in the Human Microbiome

Philipp C Münch et al. Cell Host Microbe. .

Abstract

Many bacteria resist invasive DNA by incorporating sequences into CRISPR loci, which enable sequence-specific degradation. CRISPR systems have been well studied from isolate genomes, but culture-independent metagenomics provide a new window into their diversity. We profiled CRISPR loci and cas genes in the body-wide human microbiome using 2,355 metagenomes, yielding functional and taxonomic profiles for 2.9 million spacers by aligning the spacer content to each sample's metagenome and corresponding gene families. Spacer and repeat profiles agree qualitatively with those from isolate genomes but expand their diversity by approximately 13-fold, with the highest spacer load present in the oral microbiome. The taxonomy of spacer sequences parallels that of their source community, with functional targets enriched for viral elements. When coupled with cas gene systems, CRISPR-Cas subtypes are highly site and taxon specific. Our analysis provides a comprehensive collection of natural CRISPR-cas loci and targets in the human microbiome.

Keywords: CRISPR system; CRISPR-Cas; bacteriophages; metagenomics; vial defense.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. High consistency and agreement of spacer sequences from HMP to public databases and presence of a length-specific GC bias.
A) Sequence lengths of spacers were largely consistent between the minimum of 28 nucleotides and a tail permitted up to 43 nucleotides over different body areas. B) HMP spacers were highly similar to CRISPRCasdb spacers in position-wise nucleotide composition normalised by spacer length and showed a palindromic pattern in both datasets. C) Nucleotide composition stratified by spacer length showed a consistent pattern for HMP- and CRISPRCasdb-derived spacer sequences D) Stability of repeat sequences (as measured by Bray–Curtis dissimilarity of k-mer counts of repeat sequences) across (i) technical replicates, (ii) samples taken from the same individuals over time and (iii) between individuals randomly selected individuals, respectively. Samples containing fewer than 25 repeats are not shown. E) HMP samples generally contain few CRISPR repeats that are sample-specific (singleton repeats). Histogram shows the proportion of singleton repeats among all repeats per sample for all samples.
Figure 2:
Figure 2:. High body-site dependent differences in spacer loads (regardless of host or target) on the HMP1-II dataset.
A) Three oral associated body sites, supra- and subgingival plaque and tongue, have significantly increased CRISPR spacer counts (Wilcoxon rank sum test on spacer counts, P < 10−6) relative to other body sites, such as the urogenital and skin microbiota. Mean values (points for spacers and triangles for repeats) and SD (lines) of the read-depth normalised load per body site are shown for observed reads and repeat and for cluster representatives to account for repetitive sequences. B) The lengths of observed CRISPR spacer and repeat sequences are consistent between most body sites, especially between gut and oral samples, but different from the spacers and repeats present in CRISPRCasdb. Mean (points and triangles) and SD (line) sizes of the spacer and repeat sequences across body sites and within CRISPRCasdb (grey). C) Correlation of species richness (number of species exceeding 1% RA) and spacer load (cluster representatives, defined as the longest sequence within a cluster of > 80% of sequence identity) of selected samples.
Figure 3:
Figure 3:. Body site dependence and a high overall high taxonomic agreement between all observed HMP1-II spacers and the general community.
A) Overview of the relative abundances of the seven most enriched taxa for the overall HMP microbiota (bottom) and for taxonomic assignments to HMP CRISPR spacers (top). B) PCoA of spacers per body area (based on BC dissimilarities on order level) show most variation to be driven by distinct stool communities and variation among oral samples. Point size indicates the number of spacer cluster (at 80% identity) per sample.
Figure 4:
Figure 4:. Functional enrichments within predicted spacer targets.
A) Log fold enrichment of Gene Ontology (GO) terms for all spacer targets within sample assemblies per body site. Terms shown here achieved at least one FDR corrected q value < 0.05 based on a Fisher test of enriched UniRef90 terms with respect to the overall contig annotation of the site (STAR Methods). Corresponding spacer targets by the global UniRef90 approach of remaining spacers are shown in Fig. S7) B) GO terms of spacers matching contigs outside of CRISPR-cassettes without any phage-related term on whole contigs, thus potentially within bacterial chromosomes. In both panels, a plus sign denotes an q value < 0.05 and GO groups cellular component (CC) and biological processes (BP) are shown (full version shown in Fig. S8.
Figure 5:
Figure 5:. Difference and similarities of cas gene abundances across body sites stratified by contributing species.
The height of each set of stacked bars (y axis) indicates the total cas abundance within a single sample, normalised for gene length and sequencing depth on a log10 scale. The taxonomic stratifications are done using a linear linearly (proportionally) scale. Species, “other,” and “unclassified” stratifications are linearly (proportionally) scaled within the total bar height. Highlighted taxa account for at least 35% of overall species abundance for each cas gene. Order of samples (bars) is according to the global Bray Curtis dissimilarity of the full microbiota within the body areas. Body areas with less than 30 samples are not shown. The y axis scale can be negative to facilitate the visualization of small abundances.

Similar articles

Cited by

References

    1. Abeles SR, Robles-Sikisaka R, Ly M, Lum AG, Salzman J, Boehm TK, and Pride DT (2014). Human oral viruses are personal, persistent and gender-consistent. ISME J 8, 1753–1767. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet 25, 25–29. - PMC - PubMed
    1. Aymeric L, and Sansonetti P (2015). Chapter 50 - Discriminating Pathogens from Commensals at Mucosal Surfaces In Mucosal Immunology (Fourth Edition), Mestecky J, Strober W, Russell MW, Kelsall BL, Cheroutre H, and Lambrecht BN, eds. (Boston: Academic Press; ), pp. 975–984.
    1. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, and Hugenholtz P (2007). CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209. - PMC - PubMed
    1. Brouns SJJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJH, Snijders APL, Dickman MJ, Makarova KS, Koonin EV, and van der Oost J (2008). Small CRISPR RNAs guide antiviral defense in prokaryotes. Science 321, 960–964. - PMC - PubMed

Publication types