Ascertainment biases in SNP chips affect measures of population divergence

Mol Biol Evol. 2010 Nov;27(11):2534-47. doi: 10.1093/molbev/msq148. Epub 2010 Jun 17.

Abstract

Chip-based high-throughput genotyping has facilitated genome-wide studies of genetic diversity. Many studies have utilized these large data sets to make inferences about the demographic history of human populations using measures of genetic differentiation such as F(ST) or principal component analyses. However, the single nucleotide polymorphism (SNP) chip data suffer from ascertainment biases caused by the SNP discovery process in which a small number of individuals from selected populations are used as discovery panels. In this study, we investigate the effect of the ascertainment bias on inferences regarding genetic differentiation among populations in one of the common genome-wide genotyping platforms. We generate SNP genotyping data for individuals that previously have been subject to partial genome-wide Sanger sequencing and compare inferences based on genotyping data to inferences based on direct sequencing. In addition, we also analyze publicly available genome-wide data. We demonstrate that the ascertainment biases will distort measures of human diversity and possibly change conclusions drawn from these measures in some times unexpected ways. We also show that details of the genotyping calling algorithms can have a surprisingly large effect on population genetic inferences. We not only present a correction of the spectrum for the widely used Affymetrix SNP chips but also show that such corrections are difficult to generalize among studies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • African Americans / genetics
  • Asian Continental Ancestry Group / genetics
  • Bias
  • Europe
  • Genetic Variation*
  • Genetics, Population*
  • Geography
  • Haplotypes / genetics
  • Humans
  • Japan
  • Likelihood Functions
  • Oligonucleotide Array Sequence Analysis*
  • Polymorphism, Single Nucleotide / genetics*
  • Principal Component Analysis
  • Sequence Analysis, DNA