Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12

DNA Res. 2001 Feb 28;8(1):11-22. doi: 10.1093/dnares/8.1.11.


Escherichia coli O157:H7 is a major food-borne infectious pathogen that causes diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome. Here we report the complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655. The chromosome is 5.5 Mb in size, 859 Kb larger than that of K-12. We identified a 4.1-Mb sequence highly conserved between the two strains, which may represent the fundamental backbone of the E. coli chromosome. The remaining 1.4-Mb sequence comprises of O157:H7-specific sequences, most of which are horizontally transferred foreign DNAs. The predominant roles of bacteriophages in the emergence of O157:H7 is evident by the presence of 24 prophages and prophage-like elements that occupy more than half of the O157:H7-specific sequences. The O157:H7 chromosome encodes 1632 proteins and 20 tRNAs that are not present in K-12. Among these, at least 131 proteins are assumed to have virulence-related functions. Genome-wide codon usage analysis suggested that the O157:H7-specific tRNAs are involved in the efficient expression of the strain-specific genes. A complete set of the genes specific to O157:H7 presented here sheds new insight into the pathogenicity and the physiology of O157:H7, and will open a way to fully understand the molecular mechanisms underlying the O157:H7 infection.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / genetics
  • Base Composition
  • Base Sequence
  • DNA, Bacterial
  • DNA, Circular
  • Disease Outbreaks
  • Escherichia coli / genetics
  • Escherichia coli Infections / microbiology*
  • Escherichia coli O157 / genetics*
  • Escherichia coli O157 / pathogenicity
  • Evolution, Molecular
  • Genetic Code
  • Genome, Bacterial*
  • Interspersed Repetitive Sequences
  • Lysogeny
  • Molecular Sequence Data
  • Open Reading Frames
  • RNA, Bacterial / genetics
  • Virulence / genetics


  • Bacterial Proteins
  • DNA, Bacterial
  • DNA, Circular
  • RNA, Bacterial

Associated data

  • GENBANK/AP002550
  • GENBANK/AP002551
  • GENBANK/AP002552
  • GENBANK/AP002553
  • GENBANK/AP002554
  • GENBANK/AP002555
  • GENBANK/AP002556
  • GENBANK/AP002557
  • GENBANK/AP002558
  • GENBANK/AP002559
  • GENBANK/AP002560
  • GENBANK/AP002561
  • GENBANK/AP002562
  • GENBANK/AP002563
  • GENBANK/AP002564
  • GENBANK/AP002565
  • GENBANK/AP002566
  • GENBANK/AP002567
  • GENBANK/AP002568
  • GENBANK/AP002569
  • GENBANK/BA000007