Impact and quantification of the sources of error in DNA pooling designs

Ann Hum Genet. 2009 Jan;73(1):118-24. doi: 10.1111/j.1469-1809.2008.00486.x. Epub 2008 Oct 15.


The analysis of genome wide variation offers the possibility of unravelling the genes involved in the pathogenesis of disease. Genome wide association studies are also particularly useful for identifying and validating targets for therapeutic intervention as well as for detecting markers for drug efficacy and side effects. The cost of such large-scale genetic association studies may be reduced substantially by the analysis of pooled DNA from multiple individuals. However, experimental errors inherent in pooling studies lead to a potential increase in the false positive rate and a loss in power compared to individual genotyping. Here we quantify various sources of experimental error using empirical data from typical pooling experiments and corresponding individual genotyping counts using two statistical methods. We provide analytical formulas for calculating these different errors in the absence of complete information, such as replicate pool formation, and for adjusting for the errors in the statistical analysis. We demonstrate that DNA pooling has the potential of estimating allele frequencies accurately, and adjusting the pooled allele frequency estimates for differential allelic amplification considerably improves accuracy. Estimates of the components of error show that differential allelic amplification is the most important contributor to the error variance in absolute allele frequency estimation, followed by allele frequency measurement and pool formation errors. Our results emphasise the importance of minimising experimental errors and obtaining correct error estimates in genetic association studies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • DNA / genetics*
  • DNA / standards
  • Gene Amplification
  • Gene Frequency
  • Genetics, Population / standards*
  • Genetics, Population / statistics & numerical data
  • Genome-Wide Association Study / standards*
  • Genome-Wide Association Study / statistics & numerical data
  • Genotype
  • Humans
  • Polymorphism, Single Nucleotide
  • Research Design / standards*
  • Research Design / statistics & numerical data
  • Selection Bias


  • DNA