Bayesian EM algorithm for scoring polymorphic deletions from SNP data and application to a common CNV on 8q24

Genet Epidemiol. 2009 May;33(4):357-68. doi: 10.1002/gepi.20391.


Copy number variations (CNVs) in the human genome provide exciting candidates for functional polymorphisms. Hence, we now assess association between CNV carrier status and diseases status by evaluating the signal intensity of SNP genotyping assays. Here, we present a novel statistical method designed to perform such inference and apply this method to a known CNV in a bipolar disorder linkage region. Using Bayesian computations we calculate the posterior probability for carrier status of a CNV in each individual of a sample by jointly analyzing genotype information and hybridization intensity. We model the signal intensity as a mixture of normal distributions, allowing for locus-specific and allele-specific distributions. Using an expectation maximization algorithm we estimate the parameters of these distributions and use these estimates for inferring carrier status of each individual and for the boundaries of the CNV. We applied the method to a sample of 3,512 individuals to a previously described common deletion on 8q24, a region consistently showing linkage to bipolar disorder, and unambiguously inferred 172 heterozygous and 1 homozygous deletion carrier. We observed no significant association between bipolar disorder and carrier status. We carefully assessed the validity of the inferred carrier status and observed no indication of errors. Furthermore, the algorithm precisely identifies the boundaries of the CNV. Finally, we assessed the power of this algorithm to detect shorter CNVs by sub-sampling from the SNPs covered by this deletion, demonstrating that our EM algorithm produces precise estimates of carrier status.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Bipolar Disorder / genetics
  • Chromosome Deletion
  • Chromosomes, Human, Pair 8 / genetics*
  • Epidemiologic Methods
  • Gene Dosage
  • Genetic Variation*
  • Genome-Wide Association Study
  • Heterozygote
  • Homozygote
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic*
  • Polymorphism, Single Nucleotide
  • Sequence Deletion