Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 31:7:433.
doi: 10.3389/fmicb.2016.00433. eCollection 2016.

Quantifying the Relative Importance of Phylogeny and Environmental Preferences As Drivers of Gene Content in Prokaryotic Microorganisms

Affiliations

Quantifying the Relative Importance of Phylogeny and Environmental Preferences As Drivers of Gene Content in Prokaryotic Microorganisms

Javier Tamames et al. Front Microbiol. .

Abstract

Two complementary forces shape microbial genomes: vertical inheritance of genes by phylogenetic descent, and acquisition of new genes related to adaptation to particular habitats and lifestyles. Quantification of the relative importance of each driving force proved difficult. We determined the contribution of each factor, and identified particular genes or biochemical/cellular processes linked to environmental preferences (i.e., propensity of a taxon to live in particular habitats). Three types of data were confronted: (i) complete genomes, which provide gene content of different taxa; (ii) phylogenetic information, via alignment of 16S rRNA sequences, which allowed determination of the distance between taxa, and (iii) distribution of species in environments via 16S rRNA sampling experiments, reflecting environmental preferences of different taxa. The combination of these three datasets made it possible to describe and quantify the relationships among them. We found that, although phylogenetic descent was responsible for shaping most genomes, a discernible part of the latter was correlated to environmental adaptations. Particular families of genes were identified as environmental markers, as supported by direct studies such as metagenomic sequencing. These genes are likely important for adaptation of bacteria to particular conditions or habitats, such as carbohydrate or glycan metabolism genes being linked to host-associated environments.

Keywords: bioinformatics; environmental preference; genome content; genome evolution; habitat preference; phylogenetic diversity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Outline of the procedure followed. Primary data were taken from NCBI genomes, GenBank Env and Greengenes. Matrices of properties for every genus were created with these data. Phylogenetic distances between genera were obtained from GeenGenes alignment. Correlations between each pair of genera were computed to generate gene content and environmental correlations, and co-occurrence strength was calculated by a Fisher test of the co-occurrence data. Finally, a combined matrix was created with pairs of genera in rows and the four measures in four columns. The matrices shown in green were used in the first part of the paper and the combined matrix in red in the second. A more detailed description can be found in Figure S1 in the Supplementary Material.
Figure 2
Figure 2
MDS decomposition of the gene content matrix. Each dot corresponds to a different genus, and these are colored according to different criteria in each plot. (A) colored by environmental preferences. (B) colored according to lifestyles. (C) colored according taxonomy.
Figure 3
Figure 3
Canonical Correspondence Analysis (CCA) of the gene content matrix, using environmental preferences as explanatory variables. Orange crosses show the genera, blue squares the individual COGs in the matrix, and yellow circles represent the projections of the habitat preferences.
Figure 4
Figure 4
Three-dimensional representation of genomic, environmental, and phylogenetic distances among all pairs of bacterial and archaeal genera. Each point in the plot corresponds to a pair of genera, indicating their particular gene content, environmental and phylogenetic distances. Examples discussed in the text are highlighted.
Figure 5
Figure 5
Relationships between phylogenetic distance and gene content and environmental correlations. Box-plots have been generated as explained in Figure S3, and show the quantification of the relationships between the three distances. Boxes are generated by discretizing the variable in the x-axis, and show the distribution of the measures in the y-axis corresponding to that discrete values in x. Therefore, the plots explain how the variable in y responds to the changes in x. Permuting the axes changes the discretization to the other variable. The boxes correspond to upper and lower quartiles of the data, and the marks within correspond to the median. Lines outside boxes (whiskers) show the variability outside the boxes, as an indication of the dispersion of the data. The plots shows how gene content similarity responds to: (A) phylogenetic distance, or (B) environmental correlation.
Figure 6
Figure 6
Results for Mantel tests between matrices of gene content correlation and phylogenetic distance or environmental correlation. The plot shows the fit between gene content correlation and phylogenetic distance, between gene content and environmental correlations, and the corresponding partial tests discounting either the influence of environmental correlation or the influence of phylogenetic distance. The bars show 95% confidence intervals.

Similar articles

Cited by

References

    1. Acinas S. G., Marcelino L. A., Klepac-Ceraj V., Polz M. F. (2004). Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J. Bacteriol. 186, 2629–2635. 10.1128/JB.186.9.2629-2635.2004 - DOI - PMC - PubMed
    1. Attisano L., Wrana J. L. (2002). Signal transduction by the TGF-beta superfamily. Science 296, 1646–1647. 10.1126/science.1071809 - DOI - PubMed
    1. Benjamini Y., Drai D., Elmer G., Kafkafi N., Golani I. (2001). Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284. 10.1016/S0166-4328(01)00297-2 - DOI - PubMed
    1. Boussau B., Karlberg E. O., Frank A. C., Legault B. A., Andersson S. G. E. (2004). Computational inference of scenarios for α-proteobacterial genome evolution. Proc. Natl. Acad. Sci. U.S.A. 101, 9722–9727. 10.1073/pnas.0400975101 - DOI - PMC - PubMed
    1. Brady C., Cleenwerck I., Venter S., Vancanneyt M., Swings J., Coutinho T. (2008). Phylogeny and identification of Pantoea species associated with plants, humans and the natural environment based on multilocus sequence analysis (MLSA). Syst. Appl. Microbiol. 31, 447–460. 10.1016/j.syapm.2008.09.004 - DOI - PubMed