Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 190 (23), 7773-85

Genome Sequence of a Nephritogenic and Highly Transformable M49 Strain of Streptococcus Pyogenes

Affiliations

Genome Sequence of a Nephritogenic and Highly Transformable M49 Strain of Streptococcus Pyogenes

W Michael McShan et al. J Bacteriol.

Abstract

The 1,815,783-bp genome of a serotype M49 strain of Streptococcus pyogenes (group A streptococcus [GAS]), strain NZ131, has been determined. This GAS strain (FCT type 3; emm pattern E), originally isolated from a case of acute post-streptococcal glomerulonephritis, is unusually competent for electrotransformation and has been used extensively as a model organism for both basic genetic and pathogenesis investigations. As with the previously sequenced S. pyogenes genomes, three unique prophages are a major source of genetic diversity. Two clustered regularly interspaced short palindromic repeat (CRISPR) regions were present in the genome, providing genetic information on previous prophage encounters. A unique cluster of genes was found in the pathogenicity island-like emm region that included a novel Nudix hydrolase, and, further, this cluster appears to be specific for serotype M49 and M82 strains. Nudix hydrolases eliminate potentially hazardous materials or prevent the unbalanced accumulation of normal metabolites; in bacteria, these enzymes may play a role in host cell invasion. Since M49 S. pyogenes strains have been known to be associated with skin infections, the Nudix hydrolase and its associated genes may have a role in facilitating survival in an environment that is more variable and unpredictable than the uniform warmth and moisture of the throat. The genome of NZ131 continues to shed light upon the evolutionary history of this human pathogen. Apparent horizontal transfer of genetic material has led to the existence of highly variable virulence-associated regions that are marked by multiple rearrangements and genetic diversification while other regions, even those associated with virulence, vary little between genomes. The genome regions that encode surface gene products that will interact with host targets or aid in immune avoidance are the ones that display the most sequence diversity. Thus, while natural selection favors stability in much of the genome, it favors diversity in these regions.

Figures

FIG. 1.
FIG. 1.
Circular representation of the S. pyogenes strain NZ131 genome. Outer circle shows COG functional categories of coding regions in the clockwise direction. The lines in each concentric circle indicate the position of the represented feature; the color key is show to the right of the map. The second circle shows predicted coding regions transcribed on the forward (clockwise) DNA strand. The third circle shows predicted coding regions transcribed on the reverse (counterclockwise) DNA strand. The fourth circle shows COG functional categories of coding regions in the counterclockwise direction. The fifth circle shows mobile genetic elements and bacteriophage genomes (red). The sixth circle shows rRNA operons (dark gray). The seventh and eighth circles show the percent G+C content of the sequence and the percent G+C deviation by strand, respectively.
FIG. 2.
FIG. 2.
Prophages NZ131.2 (A) and NZ131.3 (B). The genetic maps of the two complete prophage genomes found in strain NZ131 are shown. Above each is a multiple alignment of the NZ131 phage genome with the genome prophages that contain significant regions of homology. The predicted ORFs for each prophage are shown below the linear map and are color coded by probable functional region: lysogeny (green), DNA replication (blue), packaging and structural genes (gray), lysis (yellow), and virulence (red). Known genes are indicated next to their ORFs.
FIG. 3.
FIG. 3.
Comparison of the emm regions from the sequenced GAS genomes. The NZ131 genes from the ∼73-kbp region containing emm are shown at the bottom of the figure with the multiple alignment of this region with the corresponding regions from the other genome strains shown above. The alignment is presented by percentage similarity (grayscale) and by identifying insertions/deletions/substitutions (identified by color). For M types for which the genome has been sequenced more than once, only one example was selected unless significant differences existed. The genes associated with M49 streptococci are shown in green, and the genes associated with IS861 are shown in pink. Some genes are identified to provide orientation: ska (streptokinase), nudABC (Nudix hydrolase cluster [this work]), fbp (fibronectin binding protein), enn49, emm49 (serotype 49 M protein), sof, speB (streptococcal protease SpeB), and mf1 (mitogenic factor).
FIG. 4.
FIG. 4.
FCT (pilus/T antigen) regions of GAS genome strains. The genes from the streptococcal FCT (pilus/T antigen) region from each genome strain were compared by CLUSTAL W alignment. The M3, M5, M18, and M49 genomes were closely related at the nucleotide level over this region, and all contained the nra regulator gene. By contrast, the remaining genomes (M1, M2, M4, M6, M12, and M28) substituted the rofA gene for nra and were more diverse as a group in general. Since the serotypes that have been analyzed by genome sequencing more than once (M1, M3, and M12) are essentially identical over this range, only one of each serotype is shown. The FCT region type (51) and the emm pattern (13) of each genome are indicated. Highly conserved regions shared by different genomes are indicated by color. The nra gene from the M3 genome and the srtB gene from the M12 genome have possible mutations in their ORFs that cause premature termination of transcription. The following genes are shown: cpa (collagen binding protein), prtF1 (fibronectin binding protein; also known as sfb1), prtF2 (fibronectin binding protein; also known as pflpI and fbaB), and fbp (fibronectin binding protein).
FIG. 5.
FIG. 5.
The M49 streptococcal Nudix hydrolase region. The location of nudA, nudB, and nudC genes found in the M49 strain NZ131 are shown compared to the corresponding region from the M1 genome strain SF370 (39). Transcriptional analysis maps genes Spy1987, Spy1988, and Spy1989 onto a single polycistronic message in strain SF370. In strain NZ131, this operon has been expanded to five genes by the insertion of the three Nudix-associated genes (see Table S3 in the supplemental material). The predicted promoter is indicated by the encircled P, and the Rho-independent terminator is indicated by the filled circle.
FIG. 6.
FIG. 6.
M serotype association of nudABC in S. pyogenes. (A) A survey of S. pyogenes strains with various M serotypes or emm types found nudABC genes present in only emm49 and emm82 strains. Representative amplification products following specific PCR for the nudABC genes are shown. The nudABC genes were always present together in any PCR-positive strain. (B) Occurrence of nudA, nudB, and nudC in a variety of M types. Twenty-two relatively conserved signal sequence residues and the first 83 residues of the associated mature M protein genes were analyzed as recommended (http://www.cdc.gov/ncidod/biotech/strep/strepindex.htm). The phylogram based upon the M protein gene (emm) is presented below. M, typing by specific antiserums; emm, typing by nucleotide sequencing of the emm genes.
FIG. 7.
FIG. 7.
CAI analysis of nudABC and selected other genes. (A) The CAI for NADase (nga), streptolysin O (slo), streptolysin S (sagA), streptokinase (ska), C5a peptidase (C5a), M protein (emm49), SOF (sof), protease SpeB (speB), mitogenic factor (mf1), prophage protein Spy49_0791, exotoxin SpeH (speH), paratox, streptodornase (sda), hypothetical protein Spy49_1637c, methyltransferase Spy49_1638c, hypothetical protein encoded by nudC, Nudix hydrolase (nudB), hypothetical protein encoded by nudA, and hypothetical protein Spy49_1642c. Prophage-encoded genes are shown in blue, the members of the operon containing nudABC that are found universally in GAS strains are shown in green, and nudABC genes are shown in red. The lines indicate the mean CAI values and limits defined by 2 standard deviations (2× SD) above and below the mean. (B) The percent G+C content for each ORF listed above is shown. The dotted line indicates the average percent G+C content for the total genome.

Similar articles

See all similar articles

Cited by 66 PubMed Central articles

See all "Cited by" articles

Publication types

MeSH terms

LinkOut - more resources

Feedback