Identification of a novel family of sequence repeats among prokaryotes

OMICS. 2002;6(1):23-33. doi: 10.1089/15362310252780816.

Abstract

The rapid increase in genomic sequences provides new opportunities for comparative genomics. In this report, we describe a novel family of repeat sequences that is present in Bacteria and Archaea but not in Eukarya. The repeat loci typically consisted of repetitive stretches of nucleotides with a length of 25 to 37 bp alternated by nonrepetitive DNA spacers of approximately equal size as the repeats. The nucleotide sequences and the size of the repeats were highly conserved within a species, but between species the sequences showed no similarity. Due to their characteristic structure, we have designated this family of repeat loci as SPacers Interspersed Direct Repeats (SPIDR). The SPIDR loci were identified in more than forty different prokaryotic species. Individual species such as Mycobacterium tuberculosis contain one SPIDR locus, while other species such as Methanococcus jannaschii contained up to 20 different loci. The number of repeats in a locus varies greatly from two repeats to several dozens of repeats. The SPIDR loci were flanked by a common 300-500-bp leader sequence, which appeared to be conserved within a species but not between species. The SPIDR locus of M. tuberculosis is extensively used for strain typing. The finding of SPIDR loci in other prokaryotes, including the pathogens Salmonella, Campylobacter, and Pasteurella may extend this surveillance to other species.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Archaea / genetics*
  • Bacteria / genetics*
  • Base Sequence
  • DNA, Bacterial
  • Repetitive Sequences, Nucleic Acid*
  • Software

Substances

  • DNA, Bacterial