Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;47(3):296-303.
doi: 10.1038/ng.3200. Epub 2015 Jan 26.

Large Multiallelic Copy Number Variations in Humans

Affiliations
Free PMC article

Large Multiallelic Copy Number Variations in Humans

Robert E Handsaker et al. Nat Genet. .
Free PMC article

Abstract

Thousands of genomic segments appear to be present in widely varying copy numbers in different human genomes. We developed ways to use increasingly abundant whole-genome sequence data to identify the copy numbers, alleles and haplotypes present at most large multiallelic CNVs (mCNVs). We analyzed 849 genomes sequenced by the 1000 Genomes Project to identify most large (>5-kb) mCNVs, including 3,878 duplications, of which 1,356 appear to have 3 or more segregating alleles. We find that mCNVs give rise to most human variation in gene dosage-seven times the combined contribution of deletions and biallelic duplications-and that this variation in gene dosage generates abundant variation in gene expression. We describe 'runaway duplication haplotypes' in which genes, including HPR and ORM1, have mutated to high copy number on specific haplotypes. We also describe partially successful initial strategies for analyzing mCNVs via imputation and provide an initial data resource to support such analyses.

Figures

Figure 1
Figure 1
Ascertainment of multi-allelic copy number variations (mCNVs) across the human genome. Multi-modal patterns of variation for a high-frequency CNV (orange box represents the true extent of the CNV) can be detected in multiple windows (w1 – w6) that overlap the CNV segment (a). Where read-depth distributions from adjacent windows are highly correlated across many genomes, these windows are merged to increase power for genotyping (b). To more precisely estimate the genome sequence affected (c) many candidate intervals (green bars, i1 – i4) are tested; intervals for which the data most strongly coalesce to integer genotypes with high posterior likelihoods define the estimated CNV boundaries (i3).
Figure 2
Figure 2
Determination of the copy-number levels and alleles present at mCNV loci. Histograms of normalized read depth (a) are fitted with a Gaussian mixture model to infer integer copy number level (“genotype”) for 849 genomes. Colors represent copy number calls at 95% confidence; samples in gray have less-confident copy number calls. Similar plots for all mCNVs ascertained in this study are provided in the supplementary web resource. (b) Distribution of observed diploid copy numbers across 8,659 CNVs ascertained in this study. The size of each circle represents the number of CNVs in each category. Colors indicate the minimum number of copy-number alleles necessary to represent the observed dynamic range of copy number variation observed at each site. For example, the blue circles represent deletions and duplications, and the various other colors represent classes of multi-allelic CNVs with various copy-number ranges and numbers of alleles.
Figure 3
Figure 3
Critical evaluation of copy number genotypes by droplet digital PCR (ddPCR). Across 38 genomes evaluated, copy number genotypes from sequencing data were compared with measurements from ddPCR. Panel (a) shows data for 6 of the 22 loci evaluated. Plots for all loci are in Supplementary Figure 4. Across the 22 loci, the two methods showed 99.9% genotype concordance at confidently called sites.
Figure 4
Figure 4
Relationship of gene copy number (in genomic DNA) to gene expression (in mRNA) for multi-allelic CNVs. (a) At four typical genic mCNVs, inter-individual variation in gene expression levels appears to arise strongly from gene dosage variation. For clarity, individual points show data for all 310 genomes that have corresponding RNA data (not just outliers) overlaid on a summary box plot showing median, inter-quartile range, and whiskers extending to the most extreme point no more than 1.5 IQR from the box edge (Tukey convention). Supplementary Figure 8 shows such analyses for many more genes. (b) Across all genic mCNV loci that are expressed in lymphoblastoid cell lines, the distribution of p-values (in tests for positive correlation between gene expression and gene dosage) is dominated by low p-values. P-values calculated using 10,000 random permutations (Supplementary Note).
Figure 5
Figure 5
Relationship between imputability of mCNVs and features of each mCNV locus. Imputability from current reference panels (here measured by imputed dosage r2) relates to multiple features of each mCNV, including (a) the copy-number range of the mCNV (the difference between the highest and lowest observed diploid copy numbers); (b) the number of common (MAF > 1%) copy-number alleles segregating at the site; (c) the mean copy number of the mCNV; and (d) the combined frequency of all copy-number alleles after the two most common. All quantities were calculated for the EUR population cohort for 184 mCNVs with MAF > 1%. In panels a and b, a small amount of random variation is added to the discrete x-axis values to aid visualization. CNVs for which at least one individual SNP showed even partial correlation to copy number (p < 10−3) in the EUR population are plotted in blue, CNVs lacking such SNPs in gray.
Figure 6
Figure 6
Haplotypes with “runaway” copy number. (a) Copy number distribution and haplotype structure of a multi-allelic CNV encompassing the HPR gene. About 25% of the non-admixed African individuals sampled by the 1000 Genomes Project exhibit HPR copy numbers greatly increased (4-8) relative to those observed in individuals sampled from all the non-African populations (generally no more than 2). The branching green plots on the right show SNP haplotypes in the region around the HPR locus, in chromosomes sampled from African populations (YRI and LWK). The origin in the middle of the haplotype plot corresponds to the edges of the HPR mCNV; the branches show places at which flanking haplotypes begin to diverge due to mutation or recombination. The thickness of each branch indicates haplotype frequency; shading indicates allele frequency of the individual SNPs used to define haplotypes. Haplotypes carrying high-copy HPR alleles (with more than one HPR copy) are indicated by black lines at branch tips with a line segment for each extra copy above one. Almost all the high-copy alleles appear to segregate on the same haplotype background. (b) A similar analysis of a mCNV affecting the ORM1 gene, which appears to have greatly expanded in copy number on a specific haplotype, producing many different high-copy alleles.

Similar articles

See all similar articles

Cited by 118 articles

See all "Cited by" articles

References

    1. Sebat J, et al. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–9. - PMC - PubMed
    1. International Schizophrenia, C. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature. 2008;455:237–41. - PMC - PubMed
    1. Weiss LA, et al. Association between microdeletion and microduplication at 16p11.2 and autism. N Engl J Med. 2008;358:667–75. - PubMed
    1. McCarthy SE, et al. Microduplications of 16p11.2 are associated with schizophrenia. Nat Genet. 2009;41:1223–7. - PMC - PubMed
    1. Bochukova EG, et al. Large, rare chromosomal deletions associated with severe early-onset obesity. Nature. 2010;463:666–70. - PMC - PubMed

Publication types

LinkOut - more resources

Feedback