Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 464 (7289), 704-12

Origins and Functional Impact of Copy Number Variation in the Human Genome

Affiliations

Origins and Functional Impact of Copy Number Variation in the Human Genome

Donald F Conrad et al. Nature.

Abstract

Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

Figures

Figure 1
Figure 1. Overview of experimental strategy for CNV discovery and genotyping
Overview of the discovery and genotyping phases of this project, with the former generating a new map of CNV locations and the latter allowing a reference set of CNV genotypes to be constructed. Data are available at the Database of Genomic Variants and http://www.sanger.ac.uk/humgen/cnv/42mio.
Figure 2
Figure 2. Functional impact of CNVs by type, frequency and population
a, Impact on genes of sets of CNV at different stages of characterization (candidate, validated, validated/genotyped loci). Genotyped CNVs are split into different classes (deletion, duplication and multiallelic). b, Impact on genes of CNV classes based on population frequency. Frequency classes: common (MAF ≥ 0.1 in any population), intermediate (0.1 > MAF > 0.01), rare (MAF ≤ 0.01 in all populations). ASN denotes JPT+CHB.
Figure 3
Figure 3. DNA sequence context enrichments around CNV breakpoints
Thirty DNA sequence motifs thought to be associated with genome instability were compared to estimated CNV breakpoints. a, The proportion of CNV breakpoint regions containing each motif was plotted separately for deletions (green circles) and duplications (red circles). Motifs generated through machine-learning in the current study are indicated with green labels, and the remainder are from the literature. Asterisks denote motifs that show significant enrichment in duplication breakpoints compared to deletion breakpoints; ‘+’ denotes motifs that are significantly overrepresented in the total set of CNV breakpoint sequences compared to matched control sequence. b, Density of Alu signal recognition particle (SRP) binding motif in 50-bp bins within (red) and flanking (white) CNV breakpoints, showing significant enrichment of the motif at CNV breakpoints; bootstrap 95% confidence intervals are indicated by blue bars. c, The density of the 13-bp motif predictive of recombination hotspots seems to be increased directly adjacent to VNTR CNVs but not around non-VNTR CNVs.
Figure 4
Figure 4. Circular map showing the genomic distribution of different classes of CNVs and their population differentiation
Chromosomes are shown colour-coded in the penultimate circle. The innermost circle shows lines connecting the origin and the new location of 58 putative inter-chromosomal duplications, coloured according to their chromosome of origin. The next circle out shows a stacked histogram representing the number of deletions (red), duplications (green) and multiallelic (blue) loci in 5-Mb bins. The next circle out shows a stacked histogram representing the number of CNVs generated by NAHR (blue), VNTR (red) and other (grey) mechanisms in each 5-Mb bin. The outermost circle shows the VST measure of population differentiation between CEU and YRI discovery samples for each CNV.
Figure 5
Figure 5. Population properties of CNV show functional impact
a, Expected derived allele frequency spectrum among 40 CEU chromosomes for different classes of genetic variation, on the basis of the estimated strength of purifying selection acting on each class (see text for details). The estimated value of γ, the average scaled population selection coefficient, is indicated in the legend for each class of variant: exonic (γ=−17, P < 10−30), intronic (γ=−8, P < 10−10), and intergenic (γ=−5, P < 10−30) CNVs. The P values are estimated using a Likelihood Ratio Test of neutrality (γ = 0). If we do not correct for incomplete ascertainment for these three classes of CNV we estimate γ to be −13, −7and −4, respectively. Similarly, if we consider only sites >1 kb, which have more complete ascertainment we estimate γ to be −15, −10 and −5, thus showing this ordering of classes of CNV to be robust. b, A CNV showing increased XP-EHH in analysis of merged SNP-CNV HapMap haplotypes; blue line and symbols, CEU-YRI; grey, CEU-CHB+JPT; green, CHB+JPT-YRI. The locations of potential functional variants are indicated by symbols: filled diamond, CNV; cross, non-synonymous SNP; x, synonymous SNP; triangle, UTR SNP. c, Linkage disequilibrium between CNV2659.1 (pink bar) and multiple sclerosis GWAS hit SNPs (pink diamonds). Near perfect linkage disequilibrium (r2 = 0.95) was observed with the top hit SNP (rs47049). Patterns of linkage disequilibrium between the CNV and other HapMap SNPs are shown with black points.

Similar articles

See all similar articles

Cited by 840 articles

See all "Cited by" articles

Publication types

MeSH terms

Feedback