Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 98 (8), 4658-63

Complete Genome Sequence of an M1 Strain of Streptococcus Pyogenes


Complete Genome Sequence of an M1 Strain of Streptococcus Pyogenes

J J Ferretti et al. Proc Natl Acad Sci U S A.


The 1,852,442-bp sequence of an M1 strain of Streptococcus pyogenes, a Gram-positive pathogen, has been determined and contains 1,752 predicted protein-encoding genes. Approximately one-third of these genes have no identifiable function, with the remainder falling into previously characterized categories of known microbial function. Consistent with the observation that S. pyogenes is responsible for a wider variety of human disease than any other bacterial species, more than 40 putative virulence-associated genes have been identified. Additional genes have been identified that encode proteins likely associated with microbial "molecular mimicry" of host characteristics and involved in rheumatic fever or acute glomerulonephritis. The complete or partial sequence of four different bacteriophage genomes is also present, with each containing genes for one or more previously undiscovered superantigen-like proteins. These prophage-associated genes encode at least six potential virulence factors, emphasizing the importance of bacteriophages in horizontal gene transfer and a possible mechanism for generating new strains with increased pathogenic potential.


Figure 1
Figure 1
Circular representation of the S. pyogenes strain SF370 genome. Outer circle, predicted coding regions transcribed on the forward (clockwise) DNA strand. Second circle, predicted coding regions transcribed on the reverse (counterclockwise) DNA strand. Third circle, stable RNA molecules. Fourth circle, mobile genetic elements: burgundy, bacteriophage; blue, transposons/IS elements; light cyan, transposons/IS elements (pseudogenes). Fifth circle, known and putative virulence factors: purple, previously identified ORFs; brown, ORFs identified as a result of genome sequence. The lines in each concentric circle indicate the position of the represented feature. Colors: dark gray, amino acid transport and metabolism; light gray, carbohydrate transport and metabolism; green, cell division and chromosome portioning; olive green, cell envelope biogenesis, outer membrane; salmon, cell motility and secretion; tan, coenzyme metabolism; violet, DNA replication, recombination and repair; yellow, energy production and conversion; light pink, function unknown; rose, general function prediction only; light brown, inorganic ion transport and metabolism; light purple, lipid metabolism; light blue, nucleotide transport and metabolism; orange, posttranslational modification, protein turnover, chaperones; red, signal transduction mechanisms; cyan, transcription; green, translation, ribosomal structure and biogenesis; purple, virulence factors; magenta, stable RNA; burgundy, bacteriophage; medium blue, pseudogenes; brown, newly identified virulence factors; blue, transposons/IS elements.
Figure 2
Figure 2
%G+C profiles of phage genomes. A plot of the average %G+C (100-base window) along the length of each phage complete genome is shown with the residue numbers in the horizontal axis. The regions encoding the known or putative virulence factors associated with each phage are enclosed within the boxed regions; these regions all show a marked decrease in average %G+C compared with the remainder of the genomes. Analysis was done by using the Genetics Computer Group software package.
Figure 3
Figure 3
Phylogram of superantigen-like proteins identified in S. pyogenes SF370. The protein alignment was generated by using clustalx (by using the blossum matrix and a bootstrap trial of 1,000). The graphical representation of the tree was generated by using treeview. Gene products encoded by SF370: red; encoded by S. pyogenes but not present in SF370: green and encoded by S. aureus. Scale bar represents the length of the branches. Bootstrap values are displayed at each internal node. Note: SpeK is present in SF370 as a partial product only; an intact copy of speK has not yet been identified in S. pyogenes. Gene products encoded by S. pyogenes are in red with those proteins specifically encoded by strain SF370 also enclosed in a box. The products encoded by S. aureus are in blue. GenBank accession nos.: S. pyogenes proteins: SSA (gb: AAA65928.1); SpeA (gb: AAC48868.1). S. aureus proteins: SEA (prf:1704203A); SEB (gb: AAA88550.1); SEC2 (gb: AAA26624.1); SED (gb: AAB06195.1); SEE (gb: AAA26617.1); SEGv (dbj: BAA36693.1); SEH (gb: AAA19777.1); SEI (gb: AAC26661.1); SEJ (gb: AAC78590.1); SEL (gb: AAG29598.1); SEM (gb: AAG36952.1); TSST (gb: AAA26682.1). Supplementary information is available on the world wide web sites for our laboratories at the University of Oklahoma ( and the University of Oklahoma Health Sciences Center (

Similar articles

See all similar articles

Cited by 366 PubMed Central articles

See all "Cited by" articles

Publication types

MeSH terms

Associated data

LinkOut - more resources