Estimating and accounting for genotyping errors in RAD-seq experiments

Mol Ecol Resour. 2020 Jul;20(4):856-870. doi: 10.1111/1755-0998.13153. Epub 2020 Apr 6.

Abstract

In non-model organisms, evolutionary questions are frequently addressed using reduced representation sequencing techniques due to their low cost, ease of use, and because they do not require genomic resources such as a reference genome. However, evidence is accumulating that such techniques may be affected by specific biases, questioning the accuracy of obtained genotypes, and as a consequence, their usefulness in evolutionary studies. Here, we introduce three strategies to estimate genotyping error rates from such data: through the comparison to high quality genotypes obtained with a different technique, from individual replicates, or from a population sample when assuming Hardy-Weinberg equilibrium. Applying these strategies to data obtained with Restriction site Associated DNA sequencing (RAD-seq), arguably the most popular reduced representation sequencing technique, revealed per-allele genotyping error rates that were much higher than sequencing error rates, particularly at heterozygous sites that were wrongly inferred as homozygous. As we exemplify through the inference of genome-wide and local ancestry of well characterized hybrids of two Eurasian poplar (Populus) species, such high error rates may lead to wrong biological conclusions. By properly accounting for these error rates in downstream analyses, either by incorporating genotyping errors directly or by recalibrating genotype likelihoods, we were nevertheless able to use the RAD-seq data to support biologically meaningful and robust inferences of ancestry among Populus hybrids. Based on these findings, we strongly recommend carefully assessing genotyping error rates in reduced representation sequencing experiments, and to properly account for these in downstream analyses, for instance using the tools presented here.

Keywords: Populus; RAD-seq; genotype likelihoods; genotyping; genotyping errors.

MeSH terms

  • Alleles
  • Chromosome Mapping / methods
  • Genome-Wide Association Study / methods
  • Genomics / methods
  • Genotype
  • Genotyping Techniques / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Populus / genetics
  • Sequence Analysis, DNA / methods*