Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb;205(2):891-917.
doi: 10.1534/genetics.116.189621. Epub 2016 Dec 22.

Correlated Mutations and Homologous Recombination Within Bacterial Populations

Affiliations

Correlated Mutations and Homologous Recombination Within Bacterial Populations

Mingzhi Lin et al. Genetics. 2017 Feb.

Abstract

Inferring the rate of homologous recombination within a bacterial population remains a key challenge in quantifying the basic parameters of bacterial evolution. Due to the high sequence similarity within a clonal population, and unique aspects of bacterial DNA transfer processes, detecting recombination events based on phylogenetic reconstruction is often difficult, and estimating recombination rates using coalescent model-based methods is computationally expensive, and often infeasible for large sequencing data sets. Here, we present an efficient solution by introducing a set of mutational correlation functions computed using pairwise sequence comparison, which characterize various facets of bacterial recombination. We provide analytical expressions for these functions, which precisely recapitulate simulation results of neutral and adapting populations under different coalescent models. We used these to fit correlation functions measured at synonymous substitutions using whole-genome data on Escherichia coli and Streptococcus pneumoniae populations. We calculated and corrected for the effect of sample selection bias, i.e., the uneven sampling of individuals from natural microbial populations that exists in most datasets. Our method is fast and efficient, and does not employ phylogenetic inference or other computationally intensive numerics. By simply fitting analytical forms to measurements from sequence data, we show that recombination rates can be inferred, and the relative ages of different samples can be estimated. Our approach, which is based on population genetic modeling, is broadly applicable to a wide variety of data, and its computational efficiency makes it particularly attractive for use in the analysis of large sequencing datasets.

Keywords: Bolthausen–Sznitman coalescent; adapting populations; bacteria; homologous recombination; population diversity; sample ages; sample selection bias.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of population genetic correlation functions. On the left, a population of N genomic sequences each of length L is shown. A pair of sequences, g and g, is compared at each site i, yielding the substitution sequence Sik, where k indexes the pair of genomes among all possible N(N1)/2 pairs. A pair of positions along the sequence separated by a distance l is shown. The population diversity, d, variance of pairwise distances, σ2, and correlation functions cM(l) mutational correlation, cS(l) structure correlation, and cR(l) rate correlation, are shown to involve taking averages or covariances in different directions along the substitution sequences.
Figure 2
Figure 2
Simulation results for population diversity and mutational correlations for KC (top row) and BSC (bottom row). Simulations used parameters N=1000, L=1000, f0=50, and γ=104, except where indicated, and all had identical mutational divergence Nμ=0.1. (A) The population diversity (d) and variance (σ2) in a population are shown as functions of the recombination coverage rate r for three different population sizes: 102, 103, and 104. (B, C) Correlation functions are shown for different recombination rates (γ): ○ 0, 104, and 103. In (C) inset, structure correlation is shown for different population sizes with shapes corresponding to (A). Full and approximate analytical solutions are shown in solid and dashed lines, respectively. (D) Population variance and correlation functions at adjacent sites (l=1) are shown as functions of γ.
Figure 3
Figure 3
Mutational correlation (cM) and population variance (σ2) in adapting populations. Simulation results are shown in circles, with error bars indicating SEM. Full analytical solutions either based on KC or BSC models are shown in dashed and solid lines, respectively. (A) shows cM(l) for different values of the selection strength, s. (B–E) show the dependence of cM(l=1) and σ2 on s and γ. The simulations used parameters N=103, L=103, μ=104, γ=104, μs=106, s=102, and f0=50, except where indicated.
Figure 4
Figure 4
Simulation results on inferring bulk population parameters from biased samples of closely related sequences. Simulations used parameters N=1000, L=10,000, f0=500, μ=104 and γ ranging from 104 to 103, and a total of 400 populations were simulated for a total time much longer than the coalescent time. For each population, we constructed a biased sample by selecting the cluster of five sequences having the lowest average pairwise coalescent time. (A) Measurement of the sample’s Ps,2(2)(l) (circles) and the bulk population’s P2(2)(l)P2(2)(l)/d (triangles) as a function of distance l, with γ=104. The blue curve is the analytical form given in Equation 15, the red line is d(1l/f¯), and the horizontal dashed line indicates the bulk population diversity, d. (B) Inferred values of bulk population parameters θ and φ, and fragment size f¯, are shown relative to their true values (open circles). The diversity of the biased sample, ds, is shown relative to the population diversity, d (filled circles). For each biased sample of five sequences, Ps,2(2)(l) was calculated and fitting to Equation 16 was used to infer θ, φ, and f¯. Fitting was performed over the range 1l50 using nonlinear least squares (R Core Team 2016).
Figure 5
Figure 5
Whole-genome sequence analysis of natural E. coli and S. pneumoniae isolates. Measured correlations of synonymous substitutions are shown as circles for Ps,2(2)(l) (black), as well as cM(l) (red), cR(l) (blue), and cS(l) (green), where cx(l)cx(l)/ds for x=M,R,S. Dashed black line corresponds to the best fit of Ps,2(2)(l) using the form given in Equation 16. Parameter values are given in Table 2. The solid colored lines correspond to the predictions of the three correlation functions based on the fit, using the BSC model value of q (see Appendix E). We note that the excellent fit of cM(l) is not surprising, since this correlation function is determined entirely by Ps,2(2)(l), which was fit, while the predictions of cR(l) and cS(l) present an independent test of the theory. The dashed green and blue lines correspond to results of fitting q, indicating that deviations from the predictions are due mainly to the choice of coalescent model, which can be inferred and used to improve the prediction. The fitted values of q range from 0.22 to 0.48, where q=0.22 for KC and q=0.33 for BSC coalescents, indicating that population tree structures often follow KC or BSC statistics, but certain clades may exhibit more general coalescent statistics.
Figure 6
Figure 6
Illustration of possible transitions between configurations for one or two pairs of sites. Each horizontal line represents a single sequence, and small vertical lines represent sites. (A) shows the possible transitions and events for one pair of sites. When the pair become identical due to an indicated transition, we denote the site by a star. (B) shows different configurations of two pairs of sites, and the coalescent states in one or both sites, and their possible transitions and events are shown in (C–E). Transitions between configurations are summarized in (F), where solid arrows represent transitions due to reproduction, internal one-site transfer, or two-site transfer, and dashed arrows correspond to an external one-site transfer into a sequence with two sites.
Figure 7
Figure 7
Illustration of external one-site transfers and their impacts on the possible genealogical trees. In each tree, X and Y are the pair of sequences under consideration, and D is the donor sequence of the external transfer shown as an arrow from D to X. A red circle represents the MRCA of X and Y before the transfer, and a red star denotes the MRCA after the transfer.
Figure 8
Figure 8
Recombination barrier and the rate of successful transfers. The plot shows the population rate of successful transfers, Nγ, as a function of b, which controls transfer efficiency such that a higher b corresponds to a larger recombination barrier. The theoretical prediction, Nγ=φ/(1+θb), is shown in solid lines. Simulation results are shown for measured rates (open circles), and inferred rates based on fitting mutational correlations (solid triangles). The solid circles are the inferred rates corrected by the scale factor, 1+θb/3. Inset shows the data collapsing onto the theoretical prediction when plotted as a function of θb. Simulations used parameters N=1000, L=1000, f0=50, and γ=104, with various values of b and μ as indicated. Each data point corresponds to an average over 10,000 simulations, which were run over a total time much longer than the coalescent time. Fitting mutational correlations was carried out using the mean-field result for cM(l) in Equation 10.

Similar articles

Cited by

References

    1. Andam C. P., Gogarten J. P., 2011. Biased gene transfer in microbial evolution. Nat. Rev. Microbiol. 9: 543–555. - PubMed
    1. Ansari M. A., Didelot X., 2014. Inference of the properties of the recombination process from whole bacterial genomes. Genetics 196: 253–265. - PMC - PubMed
    1. Ben Zakour N. L., Alsheikh-Hussain A. S., Ashcroft M. M., Khanh Nhu N. T., Roberts L. W., et al. , 2016. Sequential acquisition of virulence and fluoroquinolone resistance has shaped the evolution of Escherichia coli ST131. MBio 7: e00347–e00416. - PMC - PubMed
    1. Bolthausen E., Sznitman A., 1998. On Ruelle’s probability cascades and an abstract cavity method. Commun. Math. Phys. 197: 247–276.
    1. Brunet E., Derrida B., Mueller A., Munier S., 2007. Effect of selection on ancestry: an exactly soluble case and its phenomenological generalization. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 76: 041104. - PubMed

LinkOut - more resources