Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 23;14(4):e1007341.
doi: 10.1371/journal.pgen.1007341. eCollection 2018 Apr.

Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia

Affiliations

Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia

Daniel R Schrider et al. PLoS Genet. .

Abstract

Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Heatmaps showing several methods’ sensitivity to detect introgression.
We show the fraction of simulated genomic regions with introgression occurring under various combinations of migration times (TM, shown as a fraction of the population divergence time TD) and intensities (PM, the probability that a given lineage will be included in the introgression event) that are detected successfully by each method. (A) Accuracy of dmin and Gmin statistics, where a simulated region is classified as introgressed if the values of these statistics are found in the lower 5% tail of the distribution under complete isolation (from simulations). Thus, the false positive rate is fixed at 5%. (B) The accuracy of FILET on these same simulations. On the left we show the fraction of regions correctly classified as introgressed (compare to panel A). On the right, we show the fraction of all simulated regions that are not only classified as introgressed, but also for which the direction of gene flow was correctly inferred (i.e. if the direction is inferred with 100% accuracy for a given cell in the heatmap, the color shade of that cell will be identical to that in the heatmap on the left). The false positive rate, as determined from applying FILET to a simulated test set with no migration, is also shown.
Fig 2
Fig 2. A comparison of the power and resolution of FILET and ChromoPainter using simulations of a 1 Mb chromosome where introgression was allowed within the central 100 kb region.
As in Fig 1, the population split time was set to N generations ago, and the darkness of the heatmap shows sensitivity to introgression. Unlike Fig 1, here we are measuring sensitivity at the level of the individual base pair rather than evaluating the question of whether a window at large was recovered as containing introgressed alleles. The “coarse” version of FILET refers to a FILET classifier trained to detect introgression in 10 kb windows, which was applied to sliding windows (1 kb step size) across the chromosome. The “fine” version of FILET applied a classifier trained on 1 kb windows to sliding windows (100 bp step size) within those regions classified as introgressed by the FILET classifier. The lenient version of ChromoPainter required evidence of introgression at a single SNP to identify introgression, while the stringent version required candidate regions to contain at least 25 consecutive SNPs supporting introgression.
Fig 3
Fig 3. Inferred joint population history of D. simulans and D. sechellia, and power to detect introgression under this model.
(A) The parameterization of our best-fitting demographic model. Migration rates are shown by arrows, and are in units of 2×Nancm, where m is the probability of migration per individual in the source population per generation. (B) Confusion matrix showing FILET’s classification accuracy under this model as assessed on an independent simulated test set. Perfect accuracy would be 100% along the entire diagonal from top-left to bottom-right, and the false positive rate is the sum of top-middle and top-right cells.
Fig 4
Fig 4. A large genomic region on 3R classified by FILET as introgressed from D. simulans to D. sechellia.
Values of the dd-sim and dmin (upper two panels) within each 10 kb window in the region are shown, along with the posterior probability of introgression from FILET (i.e. 1 –P(no introgression)). Clustered regions classified as introgressed are shown as gray rectangles superimposed over these probabilities. Also shown are windowed values of π in D. sechellia, with the sweep region highlighted in red, and the locations of annotated genes with associated FlyBase identifiers [90].

Similar articles

Cited by

References

    1. Mallet J. Hybridization as an invasion of the genome. Trends in ecology & evolution. 2005;20(5):229–37. - PubMed
    1. Whitney KD, Ahern JR, Campbell LG, Albert LP, King MS. Patterns of hybridization in plants. Perspectives in Plant Ecology, Evolution and Systematics. 2010;12(3):175–82.
    1. Barton NH. The role of hybridization in evolution. Mol Ecol. 2001;10(3):551–68. - PubMed
    1. Tung J, Barreiro LB. The contribution of admixture to primate evolution. Current opinion in genetics & development. 2017;47:61–8. - PubMed
    1. Baack EJ, Rieseberg LH. A genomic view of introgression and hybrid speciation. Current opinion in genetics & development. 2007;17(6):513–8. - PMC - PubMed

LinkOut - more resources