Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
, 349 (6253), aab3761

Global Diversity, Population Stratification, and Selection of Human Copy-Number Variation

Comparative Study

Global Diversity, Population Stratification, and Selection of Human Copy-Number Variation

Peter H Sudmant et al. Science.


In order to explore the diversity and selective signatures of duplication and deletion human copy-number variants (CNVs), we sequenced 236 individuals from 125 distinct human populations. We observed that duplications exhibit fundamentally different population genetic and selective signatures than deletions and are more likely to be stratified between human populations. Through reconstruction of the ancestral human genome, we identify megabases of DNA lost in different human lineages and pinpoint large duplications that introgressed from the extinct Denisova lineage now found at high frequency exclusively in Oceanic populations. We find that the proportion of CNV base pairs to single-nucleotide-variant base pairs is greater among non-Africans than it is among African populations, but we conclude that this difference is likely due to unique aspects of non-African population history as opposed to differences in CNV load.


Fig. 1
Fig. 1. Analysis of CNVs in several world populations
The geographical locations of the 125 human populations, including two archaic genomes, assessed in this study. Populations are colored by their continental population groups, and archaic individuals are indicated in black.
Fig. 2
Fig. 2. Population structure and CNV diversity
Principal component analysis (PCA) of individuals assessed in this study plotted for bi-allelic deletions (A) and duplications (B) with colors and shapes representing continental and specific populations, respectively. Individuals are projected along the PC1 and PC2 axes. The deletion (C) and duplication (D) heterozygosity plotted and grouped by continental population. The relationship between SNV heterozygosity and deletion (E) or duplication (F) heterozygosity is compared.
Fig. 3
Fig. 3. Selection on CNVs
Folded allele frequency spectra of exon-intersecting deletions (A) and duplications (B). While deletions intersecting exons are significantly rarer than intergenic deletions, exon-intersecting duplications show no difference compared to intergenic duplications. The mean frequency of CNVs beyond a minimum size threshold is plotted for deletions (C) and duplications (D). A strong negative correlation between size and allele frequency is observed for deletions but less so for duplications.
Fig. 4
Fig. 4. Population-stratified CNVs and archaic introgression
(A) Four specific examples of population-stratified CNVs intersecting genes are shown, including LRRIQ3, the pancreatic collipase CLPS, the sperm head an acrosome formation gene DPY19L2, and the haptoglobin and haptoglobin-related genes HP and HPR. Dot-plots indicating the copy of the locus in each individual and pie charts with colors depicting the continental population distribution per copy number (see text for details and Figs. 1 and 2 and dot plots for color scheme). (B) Predicted copy number on the basis of read-depth for a 73.5 kbp duplication on chromosome 16. It is observed in the archaic Denisovan genome and at 0.84 allele frequency in Papuan and Bougainville populations, yet absent from all other assessed populations. The duplication intersects two microRNAs. The orange arrow corresponds to the position and orientation of this duplication as further highlighted in (C) and (D). (C) A heatmap representation of a ~1 Mbp region of chromosome 16p12 (chr16:21518638-22805719). Each row of the heatmap represents the estimated copy number in 1 kbp windows of a single individual across this locus. Genes, annotated segmental duplications, and arrows highlighting the size and orientation in the reference of the Denisova/Papuan-specific duplication locus (locus D) and three other duplicated loci (A, B, and C) of interest are shown below. (D) The structure of duplications A, B, C and D (as shown in 4C over the same locus) in the reference genome and the discordant paired-end read placements used to characterize two duplication structures. Structure A/C is found in all individuals, though not present in the reference genome, while structure B/D is only found in Papuan and Bougainville individuals indicating a large complex, duplication (~225 kbp) composed of different segmental duplications. Both the A/C and B/D duplication architectures exhibit inverted orientations compared to the reference. The number of reads in all Oceanic and non-Oceanic individuals supporting each structure are indicated. (E) Maximum likelihood tree of the 16p12 duplication locus (duplication D in 4B, 4C, and 4D) constructed from the locus in Orangutan, Denisova, the human reference and the inferred sequence of the Papuan duplication (24). All bootstrap values are 100%.
Fig. 5
Fig. 5. The ancestral human genome and CNV burden
(A) A heatmap of the allele frequency of 571 (1.55 Mbp) nonrepetitive sequences absent from the human reference genome yet segregating in at least one human population ordered in humans by a maximum likelihood tree (49). Four groups of interest are highlighted: G1 – ancestral sequences that have almost been completely lost from the human lineage, G2 – ancestral sequences that are largely fixed but rarely deleted (also absent in human reference), G3 – ancestral sequences that have become copy number variable since the divergence of humans and Neanderthals/Denisovans ~700 kya, and G4 – sequences potentially lost in Neanderthals and Denisovans since their divergence from humans. (B) The resulting distributions of 10,000 block-bootstrapped estimates of the difference in load between African (AFR) and non-African (nAFR) populations considering only the reference genome (GRCh37) and supplemented by sequence absent from the human reference genome (GRCh37 + NHP) included (see text for details). (C) Violin plots of the distribution of the ratio of deletion base pairs to SNV base pairs differing between every pair of African individuals (AFR-AFR), all pairs of non-African individuals (nAFR-nAFR) and every non-African, African pair (nAFR-AFR). (D) Heatmap representation of the mean ratio of deletion to SNV base pairs differing between individuals from pairs of populations.

Similar articles

See all similar articles

Cited by 118 articles

See all "Cited by" articles

Publication types

LinkOut - more resources