Efficient and cost effective population resequencing by pooling and in-solution hybridization

PLoS One. 2011 Mar 30;6(3):e18353. doi: 10.1371/journal.pone.0018353.

Abstract

High-throughput sequencing of targeted genomic loci in large populations is an effective approach for evaluating the contribution of rare variants to disease risk. We evaluated the feasibility of using in-solution hybridization-based target capture on pooled DNA samples to enable cost-efficient population sequencing studies. For this, we performed pooled sequencing of 100 HapMap samples across ∼ 600 kb of DNA sequence using the Illumina GAIIx. Using our accurate variant calling method for pooled sequence data, we were able to not only identify single nucleotide variants with a low false discovery rate (<1%) but also accurately detect short insertion/deletion variants. In addition, with sufficient coverage per individual in each pool (30-fold) we detected 97.2% of the total variants and 93.6% of variants below 5% in frequency. Finally, allele frequencies for single nucleotide variants (SNVs) estimated from the pooled data and the HapMap genotype data were tightly correlated (correlation coefficient > = 0.995).

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cost-Benefit Analysis
  • Gene Frequency / genetics
  • Genetics, Population / methods*
  • Genome, Human / genetics
  • Humans
  • INDEL Mutation / genetics
  • Nucleic Acid Hybridization / methods*
  • Polymorphism, Single Nucleotide / genetics
  • Sequence Analysis, DNA / economics*
  • Sequence Analysis, DNA / methods*
  • Solutions

Substances

  • Solutions