Simple sequence repeats in the Helicobacter pylori genome

Mol Microbiol. 1998 Mar;27(6):1091-8. doi: 10.1046/j.1365-2958.1998.00768.x.


We describe an integrated system for the analysis of DNA sequence motifs within complete bacterial genome sequences. This system is based around ACeDB, a genome database with an integrated graphical user interface; we identify and display motifs in the context of genetic, sequence and bibliographic data. Tomb et aL (1997) previously reported the identification of contingency genes in Helicobacter pylori through their association with homopolymeric tracts and dinucleotide repeats. With this as a starting point, we validated the system by a search for this type of repeat and used the contextual information to assess the likelihood that they mediate phase variation in the associated open reading frames (ORFs). We found all of the repeats previously described, and identified 27 putative phase-variable genes (including 17 previously described). These could be divided into three groups: lipopolysaccharide (LPS) biosynthesis, cell-surface-associated proteins and DNA restriction/modification systems. Five of the putative genes did not have obvious homologues in any of the public domain sequence databases. The reading frame of some ORFs was disrupted by the presence of the repeats, including the alpha(1-2) fucosyltransferase gene, necessary for the synthesis of the Lewis Y epitope. An additional benefit of this approach is that the results of each search can be analysed further and compared with those from other genomes. This revealed that H. pylori has an unusually high frequency of homopurine:homopyrimidine repeats suggesting mechanistic biases that favour their presence and instability.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • DNA Restriction-Modification Enzymes / genetics
  • Databases, Factual
  • Dinucleotide Repeats / genetics
  • Fucosyltransferases / genetics
  • Genome, Bacterial*
  • Helicobacter pylori / genetics*
  • Lipopolysaccharides / biosynthesis
  • Membrane Proteins / genetics
  • Molecular Sequence Data
  • Open Reading Frames / genetics
  • Repetitive Sequences, Nucleic Acid / genetics*
  • Sequence Analysis, DNA / methods*
  • Software


  • DNA Restriction-Modification Enzymes
  • Lipopolysaccharides
  • Membrane Proteins
  • Fucosyltransferases