Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 41 (10), e105

Crass: Identification and Reconstruction of CRISPR From Unassembled Metagenomic Data

Affiliations

Crass: Identification and Reconstruction of CRISPR From Unassembled Metagenomic Data

Connor T Skennerton et al. Nucleic Acids Res.

Abstract

Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.

Figures

Figure 1.
Figure 1.
Construction and refinement of the preliminary and final spacer graph. A schematic illustrating graph construction and potential problems in determining the correct spacer order. (A) An arrangement of four spacers representing CRISPR spacer heterogeneity, where spacer 2 and spacer 4 are both connected to spacer 1 and spacer 3. Sequencing reads that contain these spacers are shown (grey bars), some of which contain spacer 2 and others that contain spacer 4 (without hatching). Sequencing errors are marked with black circles. Incomplete spacer sequences that are found in some reads are marked with hash, plus and asterisk symbols. (B) Each read is used to create a small portion of a preliminary spacer graph (p-graph). Nodes are created from k-mers, which are cut from the ends of each spacer region (delineated by dashed vertical lines). Edges are either ‘inner’ edges, connecting nodes from the same spacer (solid arrow), or ‘jumping’ edges between different spacers (dashed arrow). (C) The initial version of the p-graph is produced by combining nodes derived from all reads, including k-mers from incomplete spacer sequences. (D) The p-graph after removal of fur caused by sequencing errors or incomplete spacers. (E) Pairs of nodes joined by inner edges are concatenated together to form spacer-nodes in the spacer graph. Jumping edges remain in the spacer graph, as they represent a DR sequence. (F) Each node now represents a correctly ordered spacer in the final spacer graph.
Figure 2.
Figure 2.
Comparison between different CRISPR loci visualization techniques. (A) Traditional approach to visualization where the spacers are shown as differently colored rectangles (the same colour refers to the same spacer) anchored to the leader sequence (white triangle). (B) The same CRISPR loci reconstructed by Crass into a spacer graph.
Figure 3.
Figure 3.
Summary of the number of repeats and spacers identified by Crass in comparison with the original analyses. The number of shared DRs and spacers for the AMD and GOS data sets are shown in the central white section of each Venn diagram. Sequences detected only by Crass are coloured grey, and those only found in the original analyses are coloured black.
Figure 4.
Figure 4.
Identification of DR types and total spacer count from in the EBPR microbial metagenome. DR types identified by Crass are shown along the x-axis. Grey bars correspond to the number of spacers found for each DR type by Crass and black bars for PILER-CR.
Figure 5.
Figure 5.
Reconstruction of the spacer arrangement of the most abundant CRISPR loci in the EBPR microbial metagenome. Each circle represents a spacer, and the lines connecting each spacer represent their positioning relative to other spacers. A spacer can be joined onto any number of other spacers (which indicates strain diversity in the population) and is coloured on a linear scale from blue to red, based on its coverage. The leader sequence (green circle) and distal end (grey circle) of the CRISPR are shown. There are two main spacer arrangements (A and B) from the leader to the tail region that merge into a conserved tail (D). A third arrangement contains unconnected spacers that may link into the leader sequence (C).

Similar articles

See all similar articles

Cited by 29 articles

See all "Cited by" articles

References

    1. Haft DH, Selengut J, Mongodin EF, Nelson KE. A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput. Biol. 2005;1:e60. - PMC - PubMed
    1. Sorek R, Kunin V, Hugenholtz P. CRISPR—a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat. Rev. Microbiol. 2008;6:181–186. - PubMed
    1. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. - PubMed
    1. Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP, Dickman MJ, Makarova KS, Koonin EV, van der Oost J. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. 2008;321:960–964. - PMC - PubMed
    1. Cui Y, Li Y, Gorge O, Platonov ME, Yan Y, Guo Z, Pourcel C, Dentovskaya SV, Balakhonov SV, Wang X, et al. Insight into microevolution of Yersinia pestis by clustered regularly interspaced short palindromic repeats. PloS One. 2008;3:e2652. - PMC - PubMed

Publication types

Feedback