Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 8;16(1):240.
doi: 10.1186/s12862-016-0791-0.

How and how much does RAD-seq bias genetic diversity estimates?

Affiliations

How and how much does RAD-seq bias genetic diversity estimates?

Marie Cariou et al. BMC Evol Biol. .

Abstract

Background: RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity.

Results: Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to "ideal" empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions.

Conclusion: The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections.

Keywords: ABC; Allele drop-out; Non-neutral model; Population genomics; Population structure; Reduced representation genomics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
a The relation between πtrue, the nucleotidic diversity measured using all individuals at RAD loci, and πRAD, measured using only loci associated with an intact restriction site, simulated under a neutral and panmictic model. b The relation between the amplitude of the RAD polymorphism bias (πRAD / πtrue) and the level of polymorphism. Solid lines represent local linear regressions
Fig. 2
Fig. 2
The relation between πtrue and πRAD in a non neutral model. Black open dots: homogeneous θ values (neutral model, equivalent to Fig. 1). Solid coloured dots: heterogeneous θ values along the genome. Each simulation was run by randomly choosing a reference θ value (θref), which was used to assign different θ values to different genomic regions, with increasing heterogeneity in the three models. Blue dots: Model 1: 70 % of loci with θ1 = θref, 20 % with θ2 = θref /2 and 10 % with θ3 = θref /10; orange dots: Model 2: 70 % of loci with θ1 = θref, 20 % with θ2 = θref /10 and 10 % with θ3 = θref /100; red dots: Model 3: 50 % of loci with θ1 = θref, 40 % with θ2 = θref /10 and 10 % with θ3 = θref /100. Solid lines represent local linear regressions. The figure shows that the underestimation of genetic diversity is stronger when polymorphism is more heterogeneous along the genome
Fig. 3
Fig. 3
The relation between πtrue and πRAD in a spatially structured model. Colours indicate divergence time between sub-populations (t), measured in 4*Ne units (that is, the ratio between the divergence time and the average coalescence time). Solid lines represent local linear regressions. The figure shows that the underestimation of genetic diversity is less strong when sub-populations are more divergent
Fig. 4
Fig. 4
A comparison between simulations and empirical data in highly diverse populations. Distributions show the πRAD values expected under the neutral panmictic model with θ = 9.7 % (American population, on the left) and θ = 7.4 % (Russian population, on the right). The black arrows indicate the true polymorphism values (πtrue) in the two populations. The grey arrows indicate the observed πRAD values. Each distribution was computed from 400 simulations
Fig. 5
Fig. 5
ABC corrections of the RAD-seq bias. The figure shows the relation between πtrue and the corrected πRAD values, that is, the θ parameter estimated by ABC. Black dots correspond to simulated data (cross-validation). Green dots represent Drosophila melanogaster populations. Blue and red dots represent American and Russian populations of Schizophyllum commune, respectively

Similar articles

Cited by

References

    1. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3:1–7. doi: 10.1371/journal.pone.0003376. - DOI - PMC - PubMed
    1. Davey JL, Blaxter MW. RAD-seq: Next-generation population genetics. Brief Funct Genomics. 2010;9:416–23. doi: 10.1093/bfgp/elq031. - DOI - PMC - PubMed
    1. Andrews KR, Luikart G. Recent novel approaches for population genomics data analysis. Mol Ecol. 2014;23:1661–7. doi: 10.1111/mec.12686. - DOI - PubMed
    1. Puritz JB, Matz MV, Toonen RJ, Weber JN, Bolnick DI, Bird CE. Demystifying the RAD fad. Mol Ecol. 2014;23:5937–42. doi: 10.1111/mec.12965. - DOI - PubMed
    1. Andrews KR, Hohenlohe PA, Miller MR, Hand BK, Seeb JE, Luikart G. Trade-offs and utility of alternative RAD-seq methods: Reply to Puritz et al. Mol Ecol. 2014;23:5943–6. doi: 10.1111/mec.12964. - DOI - PubMed

Substances

LinkOut - more resources