Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;8(6):e1002441.
doi: 10.1371/journal.pgen.1002441. Epub 2012 Jun 13.

Diverse CRISPRs Evolving in Human Microbiomes

Free PMC article

Diverse CRISPRs Evolving in Human Microbiomes

Mina Rho et al. PLoS Genet. .
Free PMC article


CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci, together with cas (CRISPR-associated) genes, form the CRISPR/Cas adaptive immune system, a primary defense strategy that eubacteria and archaea mobilize against foreign nucleic acids, including phages and conjugative plasmids. Short spacer sequences separated by the repeats are derived from foreign DNA and direct interference to future infections. The availability of hundreds of shotgun metagenomic datasets from the Human Microbiome Project (HMP) enables us to explore the distribution and diversity of known CRISPRs in human-associated microbial communities and to discover new CRISPRs. We propose a targeted assembly strategy to reconstruct CRISPR arrays, which whole-metagenome assemblies fail to identify. For each known CRISPR type (identified from reference genomes), we use its direct repeat consensus sequence to recruit reads from each HMP dataset and then assemble the recruited reads into CRISPR loci; the unique spacer sequences can then be extracted for analysis. We also identified novel CRISPRs or new CRISPR variants in contigs from whole-metagenome assemblies and used targeted assembly to more comprehensively identify these CRISPRs across samples. We observed that the distributions of CRISPRs (including 64 known and 86 novel ones) are largely body-site specific. We provide detailed analysis of several CRISPR loci, including novel CRISPRs. For example, known streptococcal CRISPRs were identified in most oral microbiomes, totaling ∼8,000 unique spacers: samples resampled from the same individual and oral site shared the most spacers; different oral sites from the same individual shared significantly fewer, while different individuals had almost no common spacers, indicating the impact of subtle niche differences on the evolution of CRISPR defenses. We further demonstrate potential applications of CRISPRs to the tracing of rare species and the virus exposure of individuals. This work indicates the importance of effective identification and characterization of CRISPR loci to the study of the dynamic ecology of microbiomes.

Conflict of interest statement

The authors have declared that no competing interests exist.


Figure 1
Figure 1. A diagram of the targeted assembly approach for CRISPR.
Figure 2
Figure 2. A potentially novel CRISPR array identified in a contig (9848 bases) from sample SRS012279.
(A) This CRISPR array has 6 copies of the repeat (repeat sequences shown in red font and spacer shown in blue). (B) shows our annotation of this contig, in which the CRISPR array is highlighted in red. We first predicted ORFs in this contig using FragGeneScan , and then blasted predicted proteins against the nr protein database to retrieve annotations; for example, the predicted Cas1 is similar to the Cas1 protein identified in Leptotrichia buccalis C-1013-b (accession ID: YP_003163976), with 60% sequence identify and 80% sequence similarity.
Figure 3
Figure 3. Visualizations of the CRISPR network of 150 CRISPRs, each represented as a node.
There is an edge between two nodes, if the edit distance between the consensus sequences of the repeats of the corresponding CRISPRs is <10, with edges of small edit distances (i.e., the two CRISPRs share more similar repeats) shown in thick lines and edges of larger edit distances in thin lines. In (A), the known CRISPRs are shown as blue nodes (except for several CRISPRs highlighted in green), and the novel CRISPRs identified in the HMP datasets are shown as red nodes. In (B), the nodes are colored based on body site, in which the CRISPRs are most frequently found. CRISPRs are assigned as rare if they were found in <5 samples; otherwise, they are assigned to particular body site(s) if they are found in more than 10 percent of the samples for that particular body site (e.g., stool+skin). The figures were prepared using Cytoscape .
Figure 4
Figure 4. Distribution of CRISPRs across body sites.
In this figure, the x-axis represents 150 CRISPRs and the y-axis represents the total number of samples in which instances of each CRISPR are found. Note that there are roughly one third as many stool samples as oral samples, probably explaining the apparently smaller number of CRISPRs in the gut microbiome. See Table S3 for details of the distribution of CRISPRs across body sites.
Figure 5
Figure 5. Sharing of streptococcal CRISPR spacers among samples from 6 individuals.
In this map, the rows are the 761 spacers (clustered at 98% identify) identified in one or more of these 6 individuals, and the columns are samples (e.g., Stool_v1_p1 indicates a sample from stool of individual 1, in visit 1; Tongue_v2_p1 indicates dataset from tongue, individual 1, in visit 2). Buccal stands for buccal mucosa, and SupraPlaque stands for supragingival plaque. The red lines indicate the presence of spacers in each of the samples. Multiple lines in the same row represent a spacer that is shared by multiple samples.
Figure 6
Figure 6. Traces of viral sequences in the streptococcal CRISPRs in human microbiomes.
(A) A two-way clustering of viral genomes and the HMP datasets based on the presence patterns of viral sequences in the CRISPR loci identified in the HMP datasets: the columns are the viral genomes, and the rows are HMP datasets. It shows that the genome of Streptococcus phage PH10 (NC_012756) has the most regions that are similar to the spacers in streptococcal CRISPRs. This figure was prepared using the heatmap function in R, with the default clustering method (hclust) and distance measure (Euclidean). (B) Mapping of the spacers onto the 31,276 base genome of Streptococcus phage PH10; in this figure, each vertical line shows a potential proto-spacer, a region in the virus genome that is similar to a spacer found in HMP datasets; lines of the same color show sets of proto-spacers identified from the same HMP dataset (other individual proto-spacers are shown in gray lines); the ORFs are shown in arrows (the red arrow is an integrase and the green arrow is annotated as endolysin).

Similar articles

See all similar articles

Cited by 48 articles

See all "Cited by" articles


    1. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. - PubMed
    1. Horvath P, Barrangou R. CRISPR/Cas, the immune system of bacteria and archaea. Science. 2010;327:167–170. - PubMed
    1. Jansen R, Embden JD, Gaastra W, Schouls LM. Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol. 2002;43:1565–1575. - PubMed
    1. Sorek R, Kunin V, Hugenholtz P. CRISPR–a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat Rev Microbiol. 2008;6:181–186. - PubMed
    1. van der Oost J, Jore MM, Westra ER, Lundgren M, Brouns SJ. CRISPR-based adaptive and heritable immunity in prokaryotes. Trends Biochem Sci. 2009;34:401–407. - PubMed

Publication types


LinkOut - more resources