Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2006 Nov 29;361(1475):1917-27.
doi: 10.1098/rstb.2006.1917.

Sequences, sequence clusters and bacterial species

Affiliations
Review

Sequences, sequence clusters and bacterial species

William P Hanage et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Whatever else they should share, strains of bacteria assigned to the same species should have house-keeping genes that are similar in sequence. Single gene sequences (or rRNA gene sequences) have very few informative sites to resolve the strains of closely related species, and relationships among similar species may be confounded by interspecies recombination. A more promising approach (multilocus sequence analysis, MLSA) is to concatenate the sequences of multiple house-keeping loci and to observe the patterns of clustering among large populations of strains of closely related named bacterial species. Recent studies have shown that large populations can be resolved into non-overlapping sequence clusters that agree well with species assigned by the standard microbiological methods. The use of clustering patterns to inform the division of closely related populations into species has many advantages for poorly studied bacteria (or to re-evaluate well-studied species), as it provides a way of recognizing natural discontinuities in the distribution of similar genotypes. Clustering patterns can be used by expert groups as the basis of a pragmatic approach to assigning species, taking into account whatever additional data are available (e.g. similarities in ecology, phenotype and gene content). The development of large MLSA Internet databases provides the ability to assign new strains to previously defined species clusters and an electronic taxonomy. The advantages and problems in using sequence clusters as the basis of species assignments are discussed.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Interspecies recombination and its effect on species assignments based on a single gene sequence. The relatedness among isolates of three species is inferred from a tree constructed using the sequences of a single house-keeping gene. Isolates of species A are well resolved from those of species B, and from the strain of the more distantly related species C used an outgroup. Consider a homologous recombinational event that occurs in a strain of species B (arrow), replacing the single locus used to assign the species with the corresponding sequence from a strain of a relatively divergent species. Now, the strain will not be recognized as a strain of species B and will be incorrectly assigned (dotted line) as more distantly related to species B than the outgroup.
Figure 2
Figure 2
Resolving populations of B. pseudomallei, B. mallei and B. thailandensis. All of the isolates in the B. pseudomallei MLST database (which includes isolates of closely related species) were extracted and the sequences at the seven MLST loci were concatenated for each different multilocus genotype (strain) and a tree was constructed using MrBayes v. 3.1. The dataset included 400 different strains (STs) of B. pseudomallei, 17 of B. thailandensis, and two each of B. mallei and B. oklahomensis. The scale shows genetic distance, corrected for the best-fitting substitution model determined using MrModeltest and MrBayes. All nucleotide sites were used in the analysis. A general time reversible model was implemented with rate matrix r(A↔C) 0.012: r(A↔G) 0.419: r(A↔T) 0.020: r(C↔G) 0.024: r(C↔T) 0.509: r(G↔T) 0.016; nucleotide frequencies A 0.18: C 0.35: G 0.32: T 0.15 and gamma parameter α=0.11. Pinvar=0.82. All trees and model parameters are based on 10 000 samples from the posterior probability at stationarity.
Figure 3
Figure 3
Resolving populations of N. meningitidis, N. meningitidis and N. gonorrhoeae. Bayesian tree constructed using the concatenated sequences (seven loci) of the first 500 different strains (STs) of N. meningitidis in the public Neisseria MLST database, all different strains of N. lactamica (171) and N. gonorrhoeae (67). The arrow shows the two strains of N. lactamica that cluster anomalously and have probably been incorrectly identified (see text). Only third codon positions were used in the analysis. The scale shows genetic distance, corrected for the best-fitting substitution model determined using MrModeltest and MrBayes. Details as in figure 2 with rate matrix r(A↔C) 0.044: r(A↔G) 0.541: r(A↔T) 0.018: r(C↔G) 0.044: r(C↔T) 0.299: r(G↔T) 0.053; nucleotide frequencies A 0.11: C 0.44: G 0.24: T 0.21 and gamma parameter α=0.481. Pinvar=0.30.
Figure 4
Figure 4
Resolving populations of S. pneumoniae, S. pseudopneumoniae, S. mitis and S. oralis. Bayesian tree constructed using the concatenated sequences of six of the MLST loci of the authentic pneumococci and atypical pneumococci (now called S. pseudopneumoniae; Arbique et al. 2004) studied by Hanage et al. (2005b), and strains assigned as S. mitis and S. oralis. NT26 is a non-serotypable presumptive pneumococcus that arises from the branch leading to the S. pneumoniae cluster. The scale shows genetic distance, corrected for the best-fitting substitution model determined using MrModeltest and MrBayes. All nucleotide sites were used in the analysis. Details as in figure 2 with rate matrix r(A↔C) 0.016: r(A↔G) 0.027: r(A↔T) 0.010: r(C↔G) 0.001: r(C↔T) 0.939: r(G↔T) 0.007; nucleotide frequencies A 0.31: C 0.18: G 0.24: T 0.27 and gamma parameter with a covarion model allowing rates to change across the tree s(off→on)=0.33 and s (on→off)=1.33.
Figure 5
Figure 5
Failure of single loci to resolve S. pneumoniae and related species. The individual gene trees (minimum evolution; all nucleotide sites) for three of the MLST loci used to produce figure 4. Sequences are coloured according to the species cluster in which they are present, as shown in figure 4.

Similar articles

Cited by

References

    1. Arbique J.C, et al. Accuracy of phenotypic and genotypic testing for identification of Streptococcus pneumoniae and description of Streptococcus pseudopneumoniae sp. nov. J. Clin. Microbiol. 2004;42:4686–4696. doi:10.1128/JCM.42.10.4686-4696.2004 - DOI - PMC - PubMed
    1. Baldwin A, et al. Multilocus sequence typing scheme that provides both species and strain differentiation for the Burkholderia cepacia complex. J. Clin. Microbiol. 2005;43:4665–4673. doi:10.1128/JCM.43.9.4665-4673.2005 - DOI - PMC - PubMed
    1. Boucher Y, Douady C.J, Sharma A.K, Kamekura M, Doolittle W.F. Intragenomic heterogeneity and intergenomic recombination among haloarchaeal rRNA genes. J. Bacteriol. 2004;186:3980–3990. doi:10.1128/JB.186.12.3980-3990.2004 - DOI - PMC - PubMed
    1. Christensen H, Kuhnert P, Olsen J.E, Bisgaard M. Comparative phylogenies of the housekeeping genes atpD, infB and rpoB and the 16S rRNA gene within the Pasteurellaceae. Int. J. Syst. Evol. Microbiol. 2004;54:1601–1609. doi:10.1099/ijs.0.03018-0 - DOI - PubMed
    1. Cohan F.M. What are bacterial species? Annu. Rev. Microbiol. 2002;56:457–487. doi:10.1146/annurev.micro.56.012302.160634 - DOI - PubMed

Publication types