Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 542 (7640), 237-241

New CRISPR-Cas Systems From Uncultivated Microbes


New CRISPR-Cas Systems From Uncultivated Microbes

David Burstein et al. Nature.


CRISPR-Cas systems provide microbes with adaptive immunity by employing short DNA sequences, termed spacers, that guide Cas proteins to cleave foreign DNA. Class 2 CRISPR-Cas systems are streamlined versions, in which a single RNA-bound Cas protein recognizes and cleaves target sequences. The programmable nature of these minimal systems has enabled researchers to repurpose them into a versatile technology that is broadly revolutionizing biological and clinical research. However, current CRISPR-Cas technologies are based solely on systems from isolated bacteria, leaving the vast majority of enzymes from organisms that have not been cultured untapped. Metagenomics, the sequencing of DNA extracted directly from natural microbial communities, provides access to the genetic material of a huge array of uncultivated organisms. Here, using genome-resolved metagenomics, we identify a number of CRISPR-Cas systems, including the first reported Cas9 in the archaeal domain of life, to our knowledge. This divergent Cas9 protein was found in little-studied nanoarchaea as part of an active CRISPR-Cas system. In bacteria, we discovered two previously unknown systems, CRISPR-CasX and CRISPR-CasY, which are among the most compact systems yet discovered. Notably, all required functional components were identified by metagenomics, enabling validation of robust in vivo RNA-guided DNA interference activity in Escherichia coli. Interrogation of environmental microbial communities combined with in vivo experiments allows us to access an unprecedented diversity of genomes, the content of which will expand the repertoire of microbe-based biotechnologies.

Conflict of interest statement

The Regents of the University of California have filed a provisional patent application related to the technology described in this work to the United States Patent and Trademark Office, in which D.B., L.B.H., S.C.S., J.A.D. and J.F.B. are listed as inventors.


Extended Data Figure 1
Extended Data Figure 1. Multiple sequences alignment of newly described Cas9 proteins
Alignment of Cas9 proteins from ARMAN-1 and ARMAN-4, as well as two closely related Cas9 proteins from uncultivated bacteria, to the Actinomyces naeslundii Cas9, whose structure has been solved.
Extended Data Figure 2
Extended Data Figure 2. Within-population variability of ARMAN-1 CRISPR arrays
Variability of reconstructed CRISPR arrays, including the most well represented (and thus assembled) sequences (Fig. 2) and array segments representing locus variants that were reconstructed from the short DNA reads. Variability is due to spacers that were present in only a subset of archaeal cells in the population, as well as spacers whose context differed due to spacer loss (indicated by black lines). White boxes indicate repeats and colored arrows indicate CRISPR spacers (spacers with different colors have different sequences, except for unique spacers that are black). In CRISPR systems, spacers are typically added unidirectionally, so the high variety of spacers on the left side is attributed to recent acquisition.
Extended Data Figure 3
Extended Data Figure 3. Novelty of the reported CRISPR-Cas systems
a, Simplified phylogenetic tree of the universal Cas1 protein. CRISPR types of known systems are noted on the wedges and branches; the newly described systems are in bold. Detailed Cas1 phylogeny is provided in Supplementary Data 4. b, Proposed evolutionary scenario that gave rise to the archaeal type II system as a result of a recombination between type II-B and type II-C loci. c, Similarity of CasX and CasY to known proteins based on the following searches: (1) Blast search against the non-redundant (NR) protein database of NCBI, (2) HMM search against an HMM database of known Cas proteins and (3) distant homology search using HHpred (E, e-value).
Extended Data Figure 4
Extended Data Figure 4. Evolutionary tree of Cas9 homologs
Maximum-likelihood phylogenic tree of Cas9 proteins, showing the previously described systems colored based on their type: II-A in blue, II-B in green and II-C in purple. The archaeal Cas9 (in red), cluster with type II-C CRISPR-Cas systems, together with two newly described bacterial Cas9 from uncultivated bacteria. Detailed tree is provided in Supplementary Data 5.
Extended Data Figure 5
Extended Data Figure 5. ARMAN-1 spacers map to genomes of archaeal community members
a, Protospacers from ARMAN-1 map to the genome of ARMAN-2, a nanoarchaeon from the same environment. Six protospacers (red arrowheads) map uniquely to a portion of the genome flanked by two long-terminal repeats (LTRs), and two additional protospacers match perfectly within the LTRs (blue and green arrowheads). This region is likely a transposon, suggesting the CRISPR-Cas system of ARMAN-1 plays a role in suppressing mobilization of this element. b, Protospacers also map to a Thermoplasmatales archaeon (I-plasma), another member of the Richmond Mine ecosystem that is found in the same samples as ARMAN organisms. The protospacers cluster within a region of the genome encoding short, hypothetical proteins, suggesting this might also represent a mobile element. NCBI accessions are provided in parenthesis.
Extended Data Figure 6
Extended Data Figure 6. Archaeal Cas9 from ARMAN-4 with a degenerate CRISPR array is found on numerous contigs
Cas9 from ARMAN-4 is highlighted in dark red on 16 nearly identical contigs from different samples. Proteins with putative domains or functions are labeled whereas hypothetical proteins are unlabeled. Fifteen of the contigs contain two degenerate direct repeats (36 nt long with one mismatch) and a single conserved spacer of 36 nt. The remaining contig contains only one direct repeat. Unlike ARMAN-1, no additional Cas proteins are found adjacent to Cas9 in ARMAN-4.
Extended Data Figure 7
Extended Data Figure 7. Predicted structures of guide RNA and purification schema for in vitro biochemistry studies
a, The CRISPR repeat and tracrRNA anti-repeat are depicted in black whereas the spacer-derived sequence is shown as a series of green N’s. No clear termination signal can be predicted from the locus, so three different tracrRNA lengths were tested based on their secondary structure – 69, 104, and 179 nt in red, blue, and pink, respectively. b, Engineered single-guide RNA corresponding to dual-guide in (a). c, Dual-guide for ARMAN-4 Cas9 with two different hairpins on 3′ end of tracrRNA (75 and 122 nt). d, Engineered single-guide RNA corresponding to dual-guide in (c). e, Conditions tested in E. coli in vivo targeting assay. f, ARMAN-1 (AR1) and ARMAN-4 (AR4) Cas9 were expressed and purified under a variety of conditions as outlined in the Methods section. Proteins outlined in blue boxes were tested for cleavage activity in vitro. g, Fractions of AR1-Cas9 and AR4-Cas9 purifications were separated on a 10% SDS-PAGE gel.
Extended Data Figure 8
Extended Data Figure 8. Programmed DNA interference by CasX
a, Plasmid interference assays for CasX.1 (Deltaproteobacteria) and CasX.2 (Planctomycetes), continued from Figure 3c (sX1, CasX spacer 1; sX2, CasX spacer 2; NT, non-target). Experiments were conducted in triplicate and mean ± s.d. is shown. b, Serial dilution of E. coli expressing a CasX locus and transformed with the specified target, continued from Figure 3b. c, PAM depletion assays for the Deltaproteobacteria CasX and d, Planctomycetes CasX expressed in E. coli. PAM sequences depleted greater than the indicated PAM depletion value threshold (PDVT) compared to a control library were used to generate the sequence logo. e, Diagram depicting the location of Northern blot probes for CasX.1. f, Northern blots for CasX.1 tracrRNA in total RNA extracted from E. coli expressing the CasX.1 locus. The sequences of the probes used are provided in Supplementary Table 2.
Figure 1
Figure 1. CRISPR-Cas systems identified in uncultivated organisms
a, Ratio of lineages with and without isolated representatives in Bacteria and Archaea, based on 31 major lineages described by Hug et al. (2016). The results highlight the massive scale of as-yet little investigated biology in these domains. Archaeal Cas9 and the novel CRISPR-CasY were found exclusively in lineages with no isolated representatives. b, Locus organization of the discovered CRISPR-Cas systems.
Figure 2
Figure 2. ARMAN-1 CRISPR array diversity and identification of the ARMAN-1 Cas9 PAM sequence
a, CRISPR arrays reconstructed from AMD samples. White boxes indicate repeats and colored diamonds indicate spacers (identical spacers are similarly colored; unique spacers are in black). The conserved region of the array is highlighted. The diversity of recently acquired spacers (on the left) indicates the system is active. Analysis of within-population CRISPR variability is presented in Extended Data Fig. 2. b, A single circular, putatively viral, contig contains 56 protospacers (red vertical bars) from the ARMAN-1 CRISPR arrays. c, Sequence analysis of 240 protospacers (Supplementary Table 1) revealed a conserved ‘NGG’ PAM downstream to the protospacers.
Figure 3
Figure 3. CRISPR-CasX is a dual-guided system that mediates programmable DNA interference in E. coli
a, Diagram of CasX plasmid interference assays. b, Serial dilution of E. coli expressing the Planctomycetes CasX locus with spacer 1 (sX1) and transformed with the specified target (sX1, CasX protospacer 1; sX2, CasX protospacer 2; NT, non-target). c, Plasmid interference by Deltaproteobacteria CasX, using the same spacers and targets as in (b). d, PAM depletion assays for the Planctomycetes CasX locus expressed in E. coli. Sequence logo was generated from PAM sequences depleted > 30-fold compared to a control library (see also Extended Data Fig. 8). e, Diagram of CasX DNA interference. f, Mapping of environmental RNA sequences to the CasX CRISPR locus (red arrow, putative tracrRNA; white boxes, repeats; green diamonds, spacers); Inset: detailed view of mapping to first repeat and spacer. g, Plasmid interference assays with the putative tracrRNA knocked out of the CasX locus and CasX coexpressed with a crRNA alone, a truncated sgRNA or a full length sgRNA (T, target; NT, non-target). Experiments presented in (c) and (g) were conducted in triplicate and mean ± s.d. is shown.
Figure 4
Figure 4. Expression of a CasY locus in E. coli is sufficient for DNA interference
a, Diagrams of CasY loci and neighboring proteins. b, Sequence logo of the 658 5′ PAM sequences depleted greater than 3-fold by CasY relative to a control library. c, Plasmid interference by E. coli expressing CasY.1 and CRISPR array expressed with a heterologous promoter and transformed with targets containing the indicated PAM. Experiments were conducted in triplicate and mean ± s.d. is shown.

Comment in

Similar articles

See all similar articles

Cited by 97 articles

See all "Cited by" articles


    1. Barrangou R, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. - PubMed
    1. Sorek R, Kunin V, Hugenholtz P. CRISPR — a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat Rev Microbiol. 2008;6:181–186. - PubMed
    1. Makarova KS, et al. An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Microbiol. 2015;13:722–736. - PMC - PubMed
    1. Shmakov S, et al. Discovery and functional characterization of diverse class 2 CRISPR-Cas systems. Mol Cell. 2015;60:385–397. - PMC - PubMed
    1. Barrangou R, Doudna JA. Applications of CRISPR technologies in research and beyond. Nat Biotechnol. 2016;34:933–941. - PubMed