Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 10, 2896
eCollection

Comparative Genomic Analysis of Soil Dwelling Bacteria Utilizing a Combinational Codon Usage and Molecular Phylogenetic Approach Accentuating on Key Housekeeping Genes

Affiliations

Comparative Genomic Analysis of Soil Dwelling Bacteria Utilizing a Combinational Codon Usage and Molecular Phylogenetic Approach Accentuating on Key Housekeeping Genes

Jayanti Saha et al. Front Microbiol.

Abstract

Soil is a diversified and complex ecological niche, home to a myriad of microorganisms particularly bacteria. The physico-chemical complexities of soil results in a plethora of physiological variations to exist within the different types of soil dwelling bacteria, giving rise to a wide variation in genome structure and complexity. This serves as an attractive proposition to analyze and compare the genome of a large number soil bacteria to comprehend their genome complexity and evolution. In this study a combination of codon usage and molecular phylogenetics of the whole genome and key housekeeping genes like infB (translation initiation factor 2), trpB (tryptophan synthase, beta subunit), atpD (ATP synthase, beta subunit), and rpoB (RNA polymerase, beta subunit) of 92 soil bacterial species spread across the entire eubacterial domain and residing in different soil types was performed. The results indicated the direct relationship of genome size with codon bias and coding frequency in the studied bacteria. The codon usage profile demonstrated by the gene trpB was found to be relatively different from the rest of the housekeeping genes with a large number of bacteria having a greater percentage of genes with Nc values less than the Nc of trpB. The results from the overall codon usage bias profile also depicted that the codon usage bias in the key housekeeping genes of soil bacteria was majorly due to selectional pressure and not mutation. The analysis of hydrophobicity of the gene product encoded by the rpoB coding sequences demonstrated tight clustering across all the soil bacteria suggesting conservation of protein structure for maintenance of form and function. The phylogenetic affinities inferred using 16S rRNA gene and the housekeeping genes demonstrated conflicting signals with trpB gene being the noisiest one. The housekeeping gene atpD was found to depict the least amount of evolutionary change in the soil bacteria considered in this study except in two Clostridium species. The phylogenetic and codon usage analysis of the soil bacteria consistently demonstrated the relatedness of Azotobacter chroococcum with different species of the genus Pseudomonas.

Keywords: atpD gene; codon usage bias (CUB); housekeeping genes; infB gene; molecular phylogenetics; rpoB gene; soil bacteria; trpB gene.

Figures

Figure 1
Figure 1
A scattered plot depicting the hydrophobicity profile of the gene products encoded by the four housekeeping genes rpoB, atpD, infB, and trpB from the soil bacterial species considered in this study. The y-axis corresponds to the hydrophobicity value whereas the x-axis corresponds to the bacterial species sorted in alphabetical order as given in Table 1.
Figure 2
Figure 2
A combined genomic Nc plot utilizing all the coding sequences of the whole genomes depicting the three typical mode of aggregation of coding sequences. Left centric aggregation represented by Clostridium butyricum JKY6D1 (in purple), mid centric aggregation shown in green by Nitrosomonas communis Nm2, and right centric aggregation depicted by Micrococcus luteus NCTC 2665, shown in red. The dashed blue line represents the null hypothesis curve which suggests that codon usage bias is solely due to mutation and not selection (Wright, 1990).
Figure 3
Figure 3
A combined Nc plot of the four housekeeping genes rpoB, atpD, infB, and trpB from the 92 soil bacterial species included in this study, depicting selectional pressure as a major unifying force in shaping codon usage pattern. The dashed blue line represents the null hypothesis curve which suggests that codon usage bias is solely due to mutation and not selection (Wright, 1990).
Figure 4
Figure 4
A phylogenetic tree showing the relationship between the soil bacterial species considered in this study based on 16S rRNA gene sequences along with Gram nature, taxonomic position and codon usage annotation data. The name of the species have been depicted in color corresponding to its Gram nature with magenta and blue representing Gram negative and positive nature, respectively. The outermost semicircle with magenta bars represents the genomic GC3 while the innermost semicircle with blue bars represents the genomic Nc. The middle strip with yellow to red color gradient depicts the genomic GC content with red representing maximum GC content. The evolutionary history was inferred by using the Maximum Likelihood method based on the Kimura 2-parameter model (Kimura, 1980). The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). The tree with the highest log likelihood (−6,331.0306) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites [five categories (+G, parameter = 0.5623)]. The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 40.6240% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were eliminated. There were a total of 357 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 (Kumar et al., 2008). The visualization and annotation of the phylogenetic tree was done using iTOL ver. 4.4.2 (Letunic and Bork, 2007).
Figure 5
Figure 5
Phylogenetic tree showing the relationship between the soil bacterial species considered in this study based on rpoB gene sequences along with Gram nature, taxonomic position and codon usage annotation data. The name of the species have been depicted in color corresponding to the Gram nature with magenta and blue representing Gram negative and positive, respectively. The outermost semicircle with green bars represents the GC3 content of rpoB sequences while the innermost semicircle with blue bars represents the Nc of the rpoB coding sequences. The middle strip with cyan to orange color gradient depicts the variation in hydrophobicity of the protein encoded by rpoB coding sequences. The evolutionary history was inferred by using the Maximum Likelihood method based on the General Time Reversible model (Nei and Kumar, 2000). The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). The tree with the highest log likelihood (−73,689.4674) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites [five categories (+G, parameter = 0.7988)]. The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 22.2913% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were eliminated. There were a total of 1,726 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 (Kumar et al., 2008). The visualization and annotation of the phylogenetic tree was done using iTOL ver. 4.4.2 (Letunic and Bork, 2007).
Figure 6
Figure 6
Phylogenetic tree showing the relationship between the soil bacterial species considered in this study based on atpD gene sequences along with Gram nature, taxonomic position and codon usage annotation data. The name of the species have been depicted in color corresponding to its Gram nature with magenta and blue representing Gram negative and positive, respectively. The outermost semicircle with green bars represents the GC3 content of atpD sequences while the innermost semicircle with blue bars represents the Nc of the atpD coding sequences. The middle strip with cyan to orange color gradient depicts the variation in hydrophobicity of the protein encoded by the atpD coding sequences. The evolutionary history was inferred by using the Maximum Likelihood method based on the General Time Reversible model (Nei and Kumar, 2000). The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites [five categories (+G, parameter = 0.9877)]. The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 6.2680% sites). All positions containing gaps and missing data were eliminated. There were a total of 1,123 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 (Kumar et al., 2008). The visualization and annotation of the phylogenetic tree was done using iTOL ver. 4.4.2 (Letunic and Bork, 2007).
Figure 7
Figure 7
Phylogenetic tree showing the relationship between the soil bacterial species considered in this study based on infB gene sequences along with Gram nature, taxonomic position and codon usage annotation data. The name of the species has been depicted in color corresponding to its Gram nature with magenta and blue representing Gram negative and positive, respectively. The outermost semicircle with green bars represents the GC3 content of infB sequences while the innermost semicircle with blue bars represents the Nc of the infB coding sequences. The middle strip with cyan to orange color gradient depicts the variation in hydrophobicity of the protein encoded by infB coding sequences. The evolutionary history was inferred by using the Maximum Likelihood method based on the General Time Reversible model (Nei and Kumar, 2000). The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). The tree with the highest log likelihood (−84,778.7972) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites [five categories (+G, parameter = 1.1105)]. The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 16.2594% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were eliminated. There were a total of 1,617 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 (Kumar et al., 2008). The visualization and annotation of the phylogenetic tree was done using iTOL ver. 4.4.2 (Letunic and Bork, 2007).
Figure 8
Figure 8
Phylogenetic tree showing the relationship between the soil bacterial species considered in this study based on trpB gene sequences along with Gram nature, taxonomic position and codon usage annotation data. The name of the species has been depicted in color corresponding to its Gram nature with magenta and blue representing Gram negative and positive, respectively. The outermost semicircle with green bars represents the GC3 content of trpB sequences while the innermost semicircle with blue bars represents the Nc of the trpB coding sequences. The middle strip with cyan to orange color gradient depicts the variation in hydrophobicity of the protein encoded by trpB coding sequences. The evolutionary history was inferred by using the Maximum Likelihood method based on the General Time Reversible model (Nei and Kumar, 2000). The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). The tree with the highest log likelihood (−57,296.2790) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites [five categories (+G, parameter = 1.2054)]. The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 14.7706% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were eliminated. There were a total of 1,098 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 (Kumar et al., 2008). The visualization and annotation of the phylogenetic tree was done using iTOL ver. 4.4.2 (Letunic and Bork, 2007).

Similar articles

See all similar articles

References

    1. Aislabie J., Deslippe J., Dymond J. (2013). Soil Microbes and Their Contribution to Soil Services. Lincoln, OR: Manaaki Whenua Press.
    1. Andújar C., Arribas P., Vogler A. (2017). Terra incognita of soil biodiversity: unseen invasions under our feet. Mol. Ecol. 26, 3087–3089. 10.1111/mec.14112 - DOI - PubMed
    1. Babbitt G. A., Alawad M. A., Schulze K. V., Hudson A. O. (2014). Synonymous codon bias and functional constraint on GC3-related DNA backbone dynamics in the prokaryotic nucleoid. Nucleic Acids Res. 42, 10915–10926. 10.1093/nar/gku811 - DOI - PMC - PubMed
    1. Baldauf S. L., Roger A. J., Wenk-Siefert I., Doolittle W. F. (2000). A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290, 972–977. 10.1126/science.290.5493.972 - DOI - PubMed
    1. Barcellos F. G., Menna P., Da Silva Batista J. S., Hungria M. (2007). Evidence of horizontal transfer of symbiotic genes from a Bradyrhizobium japonicum inoculant strain to indigenous diazotrophs Sinorhizobium (Ensifer) fredii and Bradyrhizobium elkanii in a Brazilian Savannah Soil. Appl. Environ. Microbiol. 73, 2635–2643. 10.1128/AEM.01823-06 - DOI - PMC - PubMed

LinkOut - more resources

Feedback