Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Abstract

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

Figures

Figure 1
Figure 1. Protocol outline for two copy number variation (CNV) detection platforms
The experimental procedures for Comparative Genome Hybridization (CGH) on the WGTP array and Comparative Intensity Analysis on the 500K EA platform are shown schematically (see Supplementary Methods for details), for a comparison of two male genomes (NA10851 and NA19007). The genome profile shows the log2 ratio of copy number in these two genomes chromosome-by-chromosome. The 500K EA data is smoothed over a 5-probe window. Below the genome profiles are zoomed plots of chromosome 8, and a 10Mb window containing a large duplication in NA19007 identified on both platforms (indicated by the red bracket).
Figure 2
Figure 2. Heritability of 5 CNVs in 4 HapMap trios
Panel A. The distribution of WGTP log2 ratios at 5 genotypable CNVs. Each histogram of log2 ratios in 270 HapMap individuals exhibits three clusters, each corresponding to a genotype of a biallelic CNV, with the two alleles depicted by broken and complete bars, representing lower and higher copy number alleles. Red lines above each histogram denote log2 ratios in the 12 individuals represented in panel B. Panel B. Mendelian inheritance of five CNVs in four parent-offspring trios. The individual CNVs were genotyped from WGTP clones: green - Chr8tp-17E9; yellow - Chr1tp-31C8; blue - Chr5tp-22E4; red - Chr6tp-5C12; black - Chr6tp-11A11.
Figure 3
Figure 3. Defining copy number variable regions (CNVRs), copy number variants (CNVs) and CNV ends
Overlapping CNVs called in five individuals are shown schematically for four loci (in blue), dashed lines indicate overlap. Copy number variable regions (CNVRs) represent the union of overlapping CNVs (in green). Independent juxtaposed copy number variants (in black) are identified by requiring that only individual-specific CNVs that overlap by more than a threshold proportion be merged. Intervals encompassing CNV breakpoints (in red) are defined using platform-dependent criteria (Supplementary Methods), and contain a significant paucity of recombination hotspots , (Supplementary Table 13), which results from the enrichment of segmental duplications within which fewer inferred recombination hotspots reside.
Figure 4
Figure 4. Genomic distribution of copy number variable regions
The chromosomal locations of 1,447 CNVRs are indicated by lines to either side of ideograms. Green lines denote CNVRs associated with segmental duplications. The length of right-hand lines represents the size of each CNVR. The length of left-hand lines indicates the frequency that a CNVR is detected (minor call frequency among 270 HapMap samples). When both platforms identify a CNVR, the maximum call frequency of the two is shown. For clarity, the dynamic range of length and frequency are log transformed (see scale bars). All data can be viewed at the Database of Genomic Variants (http://projects.tcag.ca/variation/).
Figure 5
Figure 5. Classes of copy number variants
CNVs identified from WGTP and 500K EA platforms can be classified from the population distribution of log2 ratios (exemplified with WGTP data) into five different types (see text). Biallelic CNVs (deletions and duplications) can be genotyped if the clusters representing different genotypes are sufficiently distinct. The numbers of each class of CNV identified on WGTP and 500K EA platforms are given, along with the proportion of those CNVs that overlap segmental duplications. The overall proportion of CNVRs overlapping segmental duplications was 20% and 34% on the 500K EA and WGTP platforms, respectively.
Figure 6
Figure 6. Patterns of linkage disequilibrium between CNVs and SNPs
Panel A. The proportion of variants that are tagged by a nearby proxy SNP (from Phase I HapMap) increases as the pairwise LD (r2) required for a proxy SNP is relaxed. This cumulative distribution is shown for both Phase I HapMap SNPs and for 65 biallelic CNVs. Panel B. Histograms of the log2 ratios among all HapMap individuals are shown for thirteen multi-allelic CNVs. The maximal squared Pearson correlation coefficient (R2) observed at a neighbouring Phase I HapMap SNP - which is highly correlated with pairwise LD (r2) at biallelic CNVs (Supplementary Figure 15) - is given for each CNV.
Figure 7
Figure 7. Population clustering from CNV genotypes
A triangle plot showing the clustering of 210 unrelated HapMap individuals assuming three ancestral populations (k=3). The proximity of an individual to each apex of the triangle indicates the proportion of that genome that is estimated to have ancestry in each of the three inferred ancestral populations. The clustering together of most individuals from the same population near a common apex indicates the clear discrimination between populations obtained through this analysis. The clustering was qualitatively similar to that obtained previously with a similar number of biallelic Alu insertion polymorphisms on different African, European and Asian population samples .
Figure 8
Figure 8. Population differentiation for copy number variation
Population differentiation, estimated by VST, for each of the three population pairwise comparisons is plotted along each chromosome. For each pairwise comparison, the VST values for all clones on the WGTP platform are shown in the lighter colour with filled circles, with VST values of CNVs detected on the 500K EA platform superimposed in a darker shade with unfilled circles. Histograms showing the distributions of log2 ratios (on the WGTP platform) among the unrelated individuals in each population are plotted for 4 example CNVs exhibiting high population differentiation, labelled A-D. Each example histogram is labelled with the chromosome coordinates of the WGTP clone, and flanking/encompassed genes are given for those CNVs mentioned in the text.

Comment in

Similar articles

See all similar articles

Cited by 1,518 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback