Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 8, 172

The CRISPRdb Database and Tools to Display CRISPRs and to Generate Dictionaries of Spacers and Repeats


The CRISPRdb Database and Tools to Display CRISPRs and to Generate Dictionaries of Spacers and Repeats

Ibtissem Grissa et al. BMC Bioinformatics.


Background: In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element) are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations.

Description: We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter.

Conclusion: It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the dictionary creator. CRISPRdb is accessible at


Figure 1
Figure 1
An entity-relationship diagram for the CRISPR database. The downloaded data are represented in the yellow box: on the left the taxonomy report information and on the right the "GenomeInfo" report information about species replicons (chromosome or plasmid). The pink box represents tables related to the CRISPR clusters: a table for the cluster locus, a table for the DR consensus and a table for the spacers.
Figure 2
Figure 2
The database construction: from genomes to CRISPRs. The first step consists in downloading prokaryotic genomes which are then submitted to the CRISPRFinder program. The detected clusters are divided into two groups: confirmed CRISPRs (>=3DRs) are stored in the database; small questionable clusters (2 or 3 DRs) are analyzed by blasting their conserved region (DR) against the approved DRs; clusters with already identified DRs are added to the CRISPR database. Remaining questionable CRISPRs are analysed for classical flanking nucleotides and spacers length compared to the DR length. Clusters that do not fit these criteria are deleted, the remaining are kept as questionable. Manual discard of some sequences can be performed by the database curator. Colour code: programs are shown in blue, confirmed CRISPRs are in pink and questionable ones are in grey.
Figure 3
Figure 3
Screenshots of the CRISPRs web-service. 1. The opening page of the prokaryotic strains: strains in pink have at least one CRISPR, strains in grey have only questionable CRISPRs and strains in yellow have no CRISPR. 2. General information on the CRISPR clusters and their location. 3. Detailed information on the clusters: DRs are in yellow, spacers are in random colours. 4. Link to the spacers fasta file.
Figure 4
Figure 4
The DR comparison tool. Screenshot from the Utilities page showing the list of DRs with an alignment example.
Figure 5
Figure 5
The first and last 17 motifs of CRISPR NC_007503_3 from Carboxydothermus hydrogenoformans Z-2901. The DRs shared by the two CRISPR loci NC_007503_3 and NC_007503_4 are shown in yellow and the variant DR observed only in NC_007503_3 is in red. CRISPR units (DR + spacer) are numbered on the left and spacers' length is indicated on the right.

Similar articles

See all similar articles

Cited by 361 articles

See all "Cited by" articles


    1. Nakata A, Amemura M, Makino K. Unusual nucleotide arrangement with repeated sequences in the Escherichia coli K-12 chromosome. J Bacteriol. 1989;171:3553–3556. - PMC - PubMed
    1. Groenen PM, Bunschoten AE, van Soolingen D, van Embden JD. Nature of DNA polymorphism in the direct repeat cluster of Mycobacterium tuberculosis; application for strain differentiation by a novel typing method. Mol Microbiol. 1993;10:1057–1065. doi: 10.1111/j.1365-2958.1993.tb00976.x. - DOI - PubMed
    1. Mojica FJ, Ferrer C, Juez G, Rodriguez-Valera F. Long stretches of short tandem repeats are present in the largest replicons of the Archaea Haloferax mediterranei and Haloferax volcanii and could be involved in replicon partitioning. Mol Microbiol. 1995;17:85–93. doi: 10.1111/j.1365-2958.1995.mmi_17010085.x. - DOI - PubMed
    1. Mojica FJ, Diez-Villasenor C, Soria E, Juez G. Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol Microbiol. 2000;36:244–246. doi: 10.1046/j.1365-2958.2000.01838.x. - DOI - PubMed
    1. Jansen R, Embden JD, Gaastra W, Schouls LM. Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol. 2002;43:1565–1575. doi: 10.1046/j.1365-2958.2002.02839.x. - DOI - PubMed


LinkOut - more resources