SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries

Nat Methods. 2008 Mar;5(3):247-52. doi: 10.1038/nmeth.1185. Epub 2008 Feb 24.


High-density single-nucleotide polymorphism (SNP) arrays have revolutionized the ability of genome-wide association studies to detect genomic regions harboring sequence variants that affect complex traits. Extensive numbers of validated SNPs with known allele frequencies are essential to construct genotyping assays with broad utility. We describe an economical, efficient, single-step method for SNP discovery, validation and characterization that uses deep sequencing of reduced representation libraries (RRLs) from specified target populations. Using nearly 50 million sequences generated on an Illumina Genome Analyzer from DNA of 66 cattle representing three populations, we identified 62,042 putative SNPs and predicted their allele frequencies. Genotype data for these 66 individuals validated 92% of 23,357 selected genome-wide SNPs, with a genotypic and sequence allele frequency correlation of r = 0.67. This approach for simultaneous de novo discovery of high-quality SNPs and population characterization of allele frequencies may be applied to any species with at least a partially sequenced genome.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Animals
  • Cattle
  • Computational Biology / methods*
  • Gene Frequency*
  • Genomic Library
  • Genotype
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA / methods*