An expectation-maximization program for determining allelic spectrum from CNV data (CoNVEM): insights into population allelic architecture and its mutational history

Hum Mutat. 2010 Apr;31(4):414-20. doi: 10.1002/humu.21199.


Copy number variations (CNVs) are a common form of genetic variation in which the allelic population contains a distribution of copy numbers of a particular gene (or other large sequence/region). The simplest forms describe deletion (0 vs. 1 copy) or duplication (1 vs. 2) events. However, some CNV loci contain a much wider range of copy numbers, such as that seen for the CCL3L1 locus. CNV classification methods typically only describe the total (diploid) copy number, leaving the underlying genotypic and allelic frequency distribution unknown. We have developed an expectation-maximization approach for the analysis of data from tandem CNVs that enables estimation of both the allelic copy number frequency distribution and the expected copy number genotype and class distribution under the Hardy-Weinberg equilibrium (HWE). The CNV expectation-maximization algorithm is available in a Web-tool (CoNVEM,, which graphically and numerically presents CNV allele and genotype distributions. We have applied this approach to the analysis of salivary amylase (AMY1A, B, and C), CCL3L1, and SULT1A1 CNVs using published data, and present inferences about the evolutionary history of these loci based on CoNVEM results.

MeSH terms

  • Algorithms*
  • Alleles*
  • Arylsulfotransferase / genetics
  • Computational Biology / methods*
  • DNA Copy Number Variations / genetics*
  • Genetics, Population*
  • Humans
  • Mutation / genetics*
  • Salivary alpha-Amylases / genetics


  • Arylsulfotransferase
  • SULT1A1 protein, human
  • Salivary alpha-Amylases