Mapping CRISPR spaceromes reveals vast host-specific viromes of prokaryotes

Commun Biol. 2020 Jun 22;3(1):321. doi: 10.1038/s42003-020-1014-1.

Abstract

CRISPR arrays contain spacers, some of which are homologous to genome segments of viruses and other parasitic genetic elements and are employed as portion of guide RNAs to recognize and specifically inactivate the target genomes. However, the fraction of the spacers in sequenced CRISPR arrays that reliably match protospacer sequences in genomic databases is small, leaving the question of the origin(s) open for the great majority of the spacers. Here, we extend the spacer analysis by examining the distribution of partial matches (matching k-mers) between spacers and genomes of viruses infecting the given host as well as the host genomes themselves. The results indicate that most of the spacers originate from the host-specific viromes, whereas self-targeting is strongly selected against. However, we present evidence that the vast majority of the viruses comprising the viromes currently remain unknown although they are likely to be related to identified viruses.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adaptation, Biological / genetics
  • Bacteria / genetics
  • Bacteria / virology
  • Clustered Regularly Interspaced Short Palindromic Repeats*
  • Escherichia coli / genetics
  • Escherichia coli / virology
  • Genome
  • Host-Pathogen Interactions / genetics
  • Prokaryotic Cells / virology*
  • Proviruses / genetics
  • Virome / genetics*