Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun;9(6):467-77.
doi: 10.1038/nrmicro2577. Epub 2011 May 9.

Evolution and Classification of the CRISPR-Cas Systems

Free PMC article

Evolution and Classification of the CRISPR-Cas Systems

Kira S Makarova et al. Nat Rev Microbiol. .
Free PMC article


The CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) modules are adaptive immunity systems that are present in many archaea and bacteria. These defence systems are encoded by operons that have an extraordinarily diverse architecture and a high rate of evolution for both the cas genes and the unique spacer content. Here, we provide an updated analysis of the evolutionary relationships between CRISPR-Cas systems and Cas proteins. Three major types of CRISPR-Cas system are delineated, with a further division into several subtypes and a few chimeric variants. Given the complexity of the genomic architectures and the extremely dynamic evolution of the CRISPR-Cas systems, a unified classification of these systems should be based on multiple criteria. Accordingly, we propose a 'polythetic' classification that integrates the phylogenies of the most common cas genes, the sequence and organization of the CRISPR repeats and the architecture of the CRISPR-cas loci.


Figure 1
Figure 1. The three stages of CRISPR–Cas action
CRISPR-Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems act in three stages: adaptation, expression and interference. In type I and type II CRISPR-Cas systems, but not in type III systems, the selection of proto-spacers in invading nucleic acid probably depends on a proto-spacer-adjacent motif (PAM),,, but how the PAM or the nucleic acid is recognized is still unclear. After the initial recognition step, Cas1 and Cas2 most probably incorporate the proto-spacers into the CRISPR locus to form spacers. During the expression stage, the CRISPR locus containing the spacers is expressed, producing a long primary CRISPR transcript (the precrRNA). The CRISPR-associated complex for antiviral defence (Cascade) complex binds the pre-crRNA, which is then cleaved by the Cas6e or Cas6f subunits (in subtype I-E or I-F, respectively), resulting in crRNAs with a typical 8-nucleotide repeat fragment on the 5′ end and the remainder of the repeat fragment, which generally forms a hairpin structure, on the 3′ flank. Type II systems use a trans-encoded small RNA (tracrRNA) that pairs with the repeat fragment of the pre-crRNA, followed by cleavage within the repeats by the housekeeping RNase III in the presence of Cas9 (formerly known as Csn1 or Csx12). Subsequent maturation might occur by cleavage at a fixed distance within the spacers, probably catalysed by Cas9. In type III systems, Cas6 is responsible for the processing step, but the crRNAs seem to be transferred to a distinct Cas complex (called Csm in subtype III-A systems and Cmr in subtype III-B systems). In subtype III-B systems, the 3′ end of the crRNA is trimmed further. During the interference step, the invading nucleic acid is cleaved. In type I systems, the crRNA guides the Cascade complex to targets that contain the complementary DNA, and the Cas3 subunit is probably responsible for cleaving the invading DNA. The PAM probably also plays an important part in target recognition in type I systems. In type II and type III systems, no Cas3 orthologue is involved (TABLE 2). In type II systems, Cas9 loaded with crRNA probably directly targets invading DNA, in a process that requires the PAM. The two subtypes of CRISPR–Cas type III systems target either DNA (subtype III-A systems) or RNA (subtype III-B systems). In type III systems, a chromosomal CRISPR locus and an invading DNA fragment are distinguished by either base pairing to the 5′ repeat fragment of the mature crRNA (resulting in no interference) or no base pairing (resulting in interference). Filled triangles represent experimentally characterized nucleases, and unfilled triangles represent nucleases that have not yet been identified.
Figure 2
Figure 2. The relationship of the three major types and ten sub-types of CRISPR systems
The typical, simplest operon architectures are shown for each type and subtype of CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) system; numerous variations exist. Orthologous genes are colour coded and identified by a family name, as given in TABLE 2. The signature genes for CRISPR–Cas types are shown within green boxes, and those for sub-types are shown within red boxes. The letters above the genes show major categories of Cas proteins: large CRISPR-associated complex for antiviral defence (Cascade) subunits (L), small Cascade subunits (S), repeat-associated mysterious protein (RAMP) Cascade subunits (R), RAMP family RNases involved in crRNA processing (RE) (note that only those in subtypes I-E, I-F and III-B systems have been characterized), and transcriptional regulators (T). The star indicates a predicted inactivated polymerase with an HD domain. For subtype I-A systems, the cas8a1 and cas8a2 genes are typically mutually exclusive but both can be considered signature genes for the subtype. For type III systems, the cas1 and cas2 genes in dashed boxes are not associated with all type III polymerase–RAMP modules. In addition to previously published data, this schematic shows Cas7 (COG1857) as a member of the RAMP superfamily. For each CRISPR–Cas subtype (except for the newly identified subtype I-D), the old names from REFS 13,14 are indicated in parentheses. Figure is modified, with permission, from REF. 14 © (2006) BioMed Central.
Figure 3
Figure 3. Phylogenetic tree for Cas1 (COG1518) proteins
The BLASTCLUST program was used to cluster the sequences of CRISPR (clustered regularly interspaced short palindromic repeats)-associated protein 1 (Cas1) by similarity (parameters: the sequence length to be covered was 75%, and the score identity threshold was 0.9), and one representative from each cluster was chosen (see the list in Supplementary information S4 (table)). Six major subtypes of type I CRISPR–Cas system (I-A to I-F), as well as type II and type III systems, are colour coded. Dashed lines show cas1 genes that are found in `hybrid' CRISPR loci containing genes from both type I and type III CRISPR–Cas systems (see main text for details). Subtypes I-U and III-U (U for unclassified) denote CRISPR–Cas systems that lack currently defined subtype-specific signature genes (see main text for details). The maximum likelihood tree was constructed using the PHYML program, from 182 informative positions in the multiple alignment of a representative set of 228 Cas1 proteins from 442 complete genomes (those that encode Cas1 from the set of 703 genomes listed in Supplementary information S1 (table)). For each CRISPRCas subtype (except for the newly identified subtype I-D), the old names from REFS 13, are indicated in parentheses.

Similar articles

See all similar articles

Cited by 773 articles

See all "Cited by" articles

Publication types

MeSH terms

LinkOut - more resources