Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 13 (11), 722-36

An Updated Evolutionary Classification of CRISPR-Cas Systems

Affiliations
Review

An Updated Evolutionary Classification of CRISPR-Cas Systems

Kira S Makarova et al. Nat Rev Microbiol.

Abstract

The evolution of CRISPR-cas loci, which encode adaptive immune systems in archaea and bacteria, involves rapid changes, in particular numerous rearrangements of the locus architecture and horizontal transfer of complete loci or individual modules. These dynamics complicate straightforward phylogenetic classification, but here we present an approach combining the analysis of signature protein families and features of the architecture of cas loci that unambiguously partitions most CRISPR-cas loci into distinct classes, types and subtypes. The new classification retains the overall structure of the previous version but is expanded to now encompass two classes, five types and 16 subtypes. The relative stability of the classification suggests that the most prevalent variants of CRISPR-Cas systems are already known. However, the existence of rare, currently unclassifiable variants implies that additional types and subtypes remain to be characterized.

Conflict of interest statement

Competing interests statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1. Functional classification of Cas proteins
Protein names follow the current nomenclature and classification. An asterisk indicates that the putative small subunit (SS) protein is instead fused to Cas8 (the type I system large subunit (LS)) in several type I subtypes. The type III system LS and type IV system LS are Cas10 and Csf1 (a Cas8 family protein), respectively. Dispensable components are indicated by dashed outlines. Cas6 is shown with a solid outline for type I because it is dispensable in some but not most systems and by a dashed line for type III because most systems lack this gene and use the Cas6 provided in trans by other CRISPR–cas loci. The two colours for Cas4 and three colours for Cas9 reflect that these proteins contribute to different stages of the CRISPR Cas response. The functions shown for type IV and type V system components are proposed based on homology to the cognate components of other systems, and have not yet been experimentally verified. The functional assignments for Cpf1 are tentatively inferred by analogy with Cas9 (only the RuvC (and TnpB)-like domains of the two proteins are homologous). CARF, CRISPR-associated Rossmann fold; pre-crRNA, pre-CRISPR RNA. This research was originally published in Biochem. Soc. Trans. Makarova K. S., Wolf Y. I., & Koonin E. V. The basic building blocks and evolution of CRISPR–Cas systems. Biochem. Soc. Trans. 2013; 41: 1392–1400 © The Biochemical Society.
Figure 2
Figure 2. Architectures of the genomic loci for the subtypes of CRISPR–Cas systems
Typical operon organization is shown for each CRISPR–Cas system subtype. For each representative genome, the respective gene locus tag names are indicated for each subunit. Homologous genes are colour-coded and identified by a family name. The gene names follow the classification from REF. . Where both a systematic name and a legacy name are commonly used, the legacy name is given under the systematic name. The small subunit is encoded by either csm2, cmr5, cse2 or csa5; no all-encompassing name has been proposed to collectively describe this gene family to date. Crosses through genes encoding the large subunit (Cas8 or Cas10 family members) indicate inactivation of the respective catalytic sites. Genes and gene regions encoding components of the interference module (CRISPR RNA (crRNA)–effector complexes or Cas9 proteins) are highlighted with a beige background. The adaptation module (cas1 and cas2) and cas6 are dispensable in subtypes III-A and III-B; in particular, they are rarely present in subtype III-B (dashed lines). Dark green denotes the CARF domain. Gene regions coloured cream represent the HD nuclease domain; the HD domain in Cas10 is distinct from that of Cas3 and Cas3″. Also coloured are the regions of cas9 that roughly correspond to the RuvC-like nuclease (lime green), HNH nuclease (yellow), recognition lobe (purple) and protospacer adjacent motif (PAM)-interacting domains (pink). The regions of cpf1 aside from the RuvC-like domain are functionally uncharacterized and are shown in grey, as is the functionally uncharacterized all1473 gene in subtype III-D.
Figure 3
Figure 3. Distribution of CRISPR–Cas systems in sequenced archaeal and bacterial genomes
a | Distribution by types. Chart showing the proportions of identified CRISPR–cas loci in bacterial or archaeal genomes that encode type I, type II, type III, type IV or type V CRISPR Cas systems. The proportion of loci that encode incomplete systems or that we could not classify unambiguously is also shown. b | Distribution by subtypes. Chart showing the proportions of identified CRISPR–cas loci in bacterial or archaeal genomes that encode each of the subtypes of CRISPR–Cas systems included in the new classification described in this article. Note that type IV and V loci each encompass a single subtype. The proportion of loci that encode incomplete systems or that we could not classify unambiguously is also shown.
Figure 4
Figure 4. Comparison of different classifications of CRISPR–Cas systems
This graph shows the strength of correlation between the new classification of CRISPR–Cas systems described here (‘subtypes’; in the centre of the graph) and other classification measures. ‘Interference genes tree’ represents a phylogeny of interference module genes, which encode multisubunit CRISPR RNA (crRNA)–effector complexes or Cas9 proteins. This tree was created using a simple clustering approach based on aggregate protein sequence similarity. ‘Adaptation genes tree’ represents clustering produced by the same method but based on both components of the adaptation module, Cas1 and Cas2. ‘Cas1 phylogeny’ is the phylogenetic tree of Cas1 proteins shown in FIG. 5. ‘Loci architecture tree’ represents clustering based on a quantitative measure we developed to compare the architectures of CRISPR–cas loci. The measure is based on a weighted similarity index of the order of cas genes. ‘Repeats (sequence)’ denotes the classification of CRISPR sequences into 24 families on the basis of sequence similarity. ‘Repeats (structure)’ denotes the classification of CRISPR sequences into 18 families on the basis of structural similarity. The species tree represents the phylogeny of bacterial and archaeal translation systems. The distances depicted are inversely proportional to the degree of similarity. The full similarity matrix is shown in Supplementary information S11 (table).
Figure 5
Figure 5. Mapping of the CRISPR–Cas classification onto the phylogenetic tree of Cas1
Subtypes from the new classification of CRISPR–Cas systems described here were mapped onto a sequence-based phylogenetic reconstruction of 1,418 proteins from the Cas1 family, which is the most conserved Cas protein family. The phylogeny shows a close agreement with the subtype classification, as subtypes I-A, I-C, I-E, I-F, I-U, II-A, II-B, and putative type V are mostly or strictly monophyletic and are shown in gradients of light grey, except for II-B, which is shown in dark grey to indicate its origin from within I-A. The more discordant distribution of Cas1 for other subtypes probably results from horizontal transfer. None of the type III subtypes is monophyletic (in contrast to the Cas10 tree shown in Supplementary information S9 (box)), and so type III subtypes are not indicated. Note that Cas1 is absent in type IV loci and so these putative CRISPR Cas systems are not shown. Triangles denote multiple collapsed branches. Individual genes are labelled with species names and gene identification numbers. Bootstrap values are indicated as percentage points; values below 50% are not shown.

Similar articles

See all similar articles

Cited by 545 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback