Discovery and genotyping of genome structural polymorphism by sequencing on a population scale

Nat Genet. 2011 Mar;43(3):269-76. doi: 10.1038/ng.768. Epub 2011 Feb 13.


Accurate and complete analysis of genome variation in large populations will be required to understand the role of genome variation in complex disease. We present an analytical framework for characterizing genome deletion polymorphism in populations using sequence data that are distributed across hundreds or thousands of genomes. Our approach uses population-level concepts to reinterpret the technical features of sequence data that often reflect structural variation. In the 1000 Genomes Project pilot, this approach identified deletion polymorphism across 168 genomes (sequenced at 4 × average coverage) with sensitivity and specificity unmatched by other algorithms. We also describe a way to determine the allelic state or genotype of each deletion polymorphism in each genome; the 1000 Genomes Project used this approach to type 13,826 deletion polymorphisms (48-995,664 bp) at high accuracy in populations. These methods offer a way to relate genome structural polymorphism to complex disease in populations.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Bayes Theorem
  • Genetics, Population
  • Genome, Human
  • Genotype
  • Humans
  • Polymorphism, Genetic*
  • Sequence Analysis, DNA / methods*