Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;5(1):233-42.
doi: 10.1093/gbe/evt002.

Gene frequency distributions reject a neutral model of genome evolution

Affiliations

Gene frequency distributions reject a neutral model of genome evolution

Alexander E Lobkovsky et al. Genome Biol Evol. 2013.

Abstract

Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a "shell" of moderately common genes, and a "cloud" of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species each over a broad range of evolutionary distances were fit to steady-state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, for example, the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
Probabilities c1 of encountering a unique gene and c10 of encountering a strictly common gene.
F<sc>ig</sc>. 2.—
Fig. 2.—
Gene frequency distributions and model fits for four groups of bacteria. The underlying trees and the mean branch lengths are shown in the insets.
F<sc>ig</sc>. 3.—
Fig. 3.—
Summary of the distributions of the AIC differences between the models with selection and the neutral model across all 400 analyzed groups of prokaryotes.
F<sc>ig</sc>. 4.—
Fig. 4.—
Dependence of the selection strength estimated from the fits of models B, C, and D to the empirical gene frequency distributions on the mean branch length in the phylogenetic tree.
F<sc>ig</sc>. 5.—
Fig. 5.—
The gene turnover rates estimated from the fits of the two-class model C to the empirical gene frequency distributions.
F<sc>ig</sc>. 6.—
Fig. 6.—
The fit of the stochastic model D to the empirical gene frequency histogram: the residuals for gene commonality classes among all groups.

Similar articles

Cited by

References

    1. Akaike H. New look at statistical-model identification. IEEE Trans Automat Control. AC. 1974;19(6):716–723.
    1. Akopyants NS, et al. PCR-based subtractive hybridization and differences in gene content among strains of Helicobacter pylori. Proc Natl Acad Sci U S A. 1998;95(22):13108–13113. - PMC - PubMed
    1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. - PMC - PubMed
    1. Baumdicker F, Hess WR, Pfaffelhuber P. The diversity of a distributed genome in bacterial populations. Ann Appl Probab. 2010;20(5):1567–1606.
    1. Baumdicker F, Hess WR, Pfaffelhuber P. The infinitely many genes model for the distributed genome of bacteria. Genome Biol Evol. 2012;4(4):443–456. - PMC - PubMed

LinkOut - more resources