Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;4(11):1558-1566.
doi: 10.1038/s41559-020-01284-0. Epub 2020 Aug 24.

Selection against archaic hominin genetic variation in regulatory regions

Affiliations

Selection against archaic hominin genetic variation in regulatory regions

Natalie Telis et al. Nat Ecol Evol. 2020 Nov.

Abstract

Traces of Neandertal and Denisovan DNA persist in the modern human gene pool, but have been systematically purged by natural selection from genes and other functionally important regions. This implies that many archaic alleles harmed the fitness of hybrid individuals, but the nature of this harm is poorly understood. Here, we show that enhancers contain less Neandertal and Denisovan variation than expected given the background selection they experience, suggesting that selection acted to purge these regions of archaic alleles that disrupted their gene regulatory functions. We infer that selection acted mainly on young archaic variation that arose in Neandertals or Denisovans shortly before their contact with humans; enhancers are not depleted of older variants found in both archaic species. Some types of enhancer appear to have tolerated introgression better than others; compared with tissue-specific enhancers, pleiotropic enhancers show stronger depletion of archaic single-nucleotide polymorphisms. To some extent, evolutionary constraint is predictive of introgression depletion, but certain tissues' enhancers are more depleted of Neandertal and Denisovan alleles than expected given their comparative tolerance to new mutations. Foetal brain and muscle are the tissues whose enhancers show the strongest depletion of archaic alleles, but only brain enhancers show evidence of unusually stringent purifying selection. We conclude that epistatic incompatibilities between human and archaic alleles are needed to explain the degree of archaic variant depletion from foetal muscle enhancers, perhaps due to divergent selection for higher muscle mass in archaic hominins compared with humans.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Replication of archaic SNP depletion after sampling control SNPs in clusters.
This plot replicates the analysis from Fig. 1b using archaic SNPs and controls sampled to match the clustering induced by LD structure. The depletion of archaic SNPs from exon and enhancers is nearly identical to the depletion measured using controls not sampled to match the spatial clustering of introgressed SNPs.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Human versus archaic reference sequence divergence as a function of enhancer pleiotropy and tissue activity.
a, Within enhancers and exons, we measured divergence of the human reference from the Altai Neandertal, the Altai Denisovan, and the YRI African genomes, then normalized each by the divergence of the same genomes within adjacent control regions. exon divergence ratios <1 indicate that purifying selection has slowed down their sequence evolution compared to less constrained adjacent regions. In contrast, divergence is accelerated in enhancers relative to control regions. This acceleration is positively correlated with pleiotropy and is stronger for archaic vs. human comparisons than for the African vs. human reference genome comparison. In the absence of selection against archaic enhancer variation, this divergence pattern should cause archaic SNPs to be enriched within high-pleiotropy enhancers, not depleted as we in fact observe. b, We see no correlation across tissue-specific enhancer sets between Neandertal divergence from the human reference and archaic allele depletion from enhancers. This suggests that differences between tissues in the depletion of introgressed archaic variants are not driven by differences in divergence between reference genomes. c, We see no correlation across tissue-specific enhancer sets between Denisovan divergence from the human reference and archaic allele depletion from enhancers. All error bars in panels ac are 95% confidence intervals derived from the binomial approximation to the Bernoulli distribution.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Neandertal and Denisovan variant depletion as a function of enhancer tissue activity.
These plots show the data from Fig. 3a with Neandertal and Denisovan odds ratios on separate plots for clarity.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Joint distributions of Neandertal and Denisovan SNP depletion within each SgDP population.
Although there are differences between populations, particularly since Denisovan introgression is sparse and noisy, all show that brain and fetal muscle enhancers are the most depleted of introgression. In most populations the ‘Blood & T-cell’ tissue is least depleted of introgression.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Counts of young and old archaic alleles present in modern populations and shared by archaic reference genomes.
a, Recall that ‘young’ introgression calls are SNPs that appear in Call Set 2 generated by Sankararaman, et al. while ‘old’ calls appear in Call Set 1 for at least one archaic species but not in either set of young calls. In every modern human population, we find that 20–30% of old introgressed SNPs are shared with both the Altai Neandertal and Altai Denisovan, suggesting they likely predate the divergence of Neandertals and Denisovans or are at least old enough to have passed between the two species by gene flow. b, In contrast, only 10–20% of young introgressed SNPs are present in both archaic reference genomes. Over 45% of young Neandertal alleles are shared with the Altai Neandertal but not the Altai Denisovan; conversely, over 45% of young Denisovan alleles are shared with only the Denisovan reference. Compared to the sets of young Neandertal and Denisovan alleles, old Neandertal and Denisovan alleles look more similar to each other in their archaic reference sharing profiles: each contains 10–25% Neandertal-specific alleles and 2–10% Denisovan-specific alleles. These patterns support our hypothesis that the old calls are indeed older than the young calls. c, This panel shows the numbers of introgressed SNPs classified as young versus old within each population. each SNP set is further subdivided into SNPs that appear in the Neandertal call set only, the Denisovan call set only, or the intersection of both call sets.
Extended Data Fig. 6 |
Extended Data Fig. 6 |. Site frequency spectra of introgressed SNPs classified as young versus old in each SgDP population.
For each call set, the corresponding vertical line demarcates the mean allele frequency of that category. In each population, the old ‘1 minus 2’ call set has the highest mean allele frequency, adding support to our hypothesis that these variants are older and/or less deleterious than either population-specific Call Set 2.
Extended Data Fig. 7 |
Extended Data Fig. 7 |. GC content cannot explain differences in singleton enrichment between tissues.
a, We partitioned the site frequency spectra of enhancers into SNPs that have GC ancestral alleles and SNPs that have AT ancestral alleles. Using each of these disjoint variants sets, we then computed singleton enrichment in enhancers versus adjacent control regions. GC-biased gene conversion is expected to have opposite effects on the two frequency spectra, increasing the proportion of GC-ancestral singletons and decreasing the proportion of AT-ancestral singletons. Despite this confounder, the finding that brain enhancers are enriched for singletons holds up when we restrict to either GC-ancestral SNPs or ATancestral SNPs. b, Across tissues, enhancers are enriched for GC base pairs compared to adjacent genomic regions. However, there is no correlation between GC content enrichment and the singleton enrichment that we attribute to purifying selection.
Extended Data Fig. 8 |
Extended Data Fig. 8 |. Joint distribution across tissues of enhancer singleton enrichment and introgression depletion.
Although singleton enrichment is correlated with depletion of young Neandertal alleles and young Denisovan alleles, the significance of this correlation disappears when all brain related tissues are excluded from the regression.
Extended Data Fig. 9 |
Extended Data Fig. 9 |. Mean enhancer phastcons scores partitioned by tissue activity.
Across all tissues, enhancers have a mean phastCons score that is slightly elevated above the genomic mean, indicating that these regions are slightly conserved over phylogenetic timescales. Fetal brain and neurosphere enhancers have a higher mean phastCons score than enhancers active in any other tissues. This result mirrors our findings on the landscape of recent purifying selection as measured by site frequency skew: fetal brain enhancers are more conserved than other regulatory elements, but fetal muscle enhancers are not.
Extended Data Fig. 10 |
Extended Data Fig. 10 |. Random sampling of control SNPs to match introgressed SNPs that have been pooled across populations.
This figure illustrates how we construct the set of SNPs that are eligible to be chosen as control SNPs that match introgressed SNPs for allele frequency and B statistic after these introgressed SNPs have been pooled across all SGDP populations.
Fig. 1 |
Fig. 1 |. Introgressed variants are depleted from enhancers and exons relative to matched control variants.
a, Schematic illustrating the process of matching archaic variants to control variants with identical allele frequencies and B statistic values. b, Introgressed-to-control variant odds ratios showing depletion of Neandertal variants from both exons and enhancers in every population sequenced by the SGDP. In the case of Oceanians, a similar pattern holds for Denisovan variant calls. error bars span 95% binomial confidence intervals.
Fig. 2 |
Fig. 2 |. Archaic variant depletion is correlated with the number of cell types in which an enhancer is active.
The pleiotropy number of an enhancer is defined as the number of tissues in which the enhancer is active (as illustrated by the inset panel). enhancers are grouped into bins of increasing pleiotropy number such that each bin contains a roughly equal number of enhancers; within each bin, we computed the odds ratio of archaic SNPs relative to matched control SNPs in each of the SGDP continental groups. error bars span 95% binomial confidence intervals. The dotted line on the y axis marks an archaic-to-control variant odds ratio of 1.0, meaning that every confidence interval lying completely below this line corresponds to a set of enhancers that are depleted of archaic variation at the P < 0.05 significance level. enhancers active in only a single cell type do not appear depleted of archaic SNPs, whereas enhancers that are active in multiple cell types contain up to 20% fewer archaic variant calls than expected.
Fig. 3 |
Fig. 3 |. Neandertal and Denisovan variant depletion varies between enhancers active in different tissues.
a, The set of enhancers active in each Roadmap cell line is significantly depleted of both Neandertal and Denisovan variation, with the exception of blood and T cells, whose Denisovan depletion confidence interval overlaps an odds ratio of 1.0 (horizontal dotted line). Data points that lie below the diagonal dotted line correspond to tissues whose enhancers are more depleted of Denisovan SNPs compared with Neandertal SNPs. The slope of the dashed regression line is significantly positive (P < 4 × 10−5), implying that Neandertal variant depletion and Denisovan variant depletion are correlated across tissues (r2 = 0.537). b, even after restricting to enhancers active in foetal muscle or foetal brain (the two tissue types most depleted of introgressed variation), pleiotropy remains negatively correlated with archaic SNP depletion. The difference between these two tissues and other tissues is driven mainly by enhancers of intermediate pleiotropy. All error bars span 95% binomial confidence intervals. eSC, embryonic stem cell; HSC, haematopoietic stem cell; mesench, mesenchymal; myosat, myosatellite cell; iPSC, induced pluripotent stem cell; eS-deriv, cell line derived from embryonic stem cells; neurosph, neurosphere cell.
Fig. 4 |
Fig. 4 |. Different landscapes of young and old introgressed archaic variation.
a, We classified introgression calls as old or young based on their presence in call sets 1 and 2 generated by Sankararaman et al.. By design, the old alleles (present in call set 1 but not call set 2) are more often shared between Neandertals and Denisovans, and we hypothesize that many of these alleles arose before the Neandertal/Denisovan divergence, as pictured, or else crossed between the two species via Neandertal/Denisovan gene flow. In contrast, we hypothesize that the young alleles most often arose after Neandertals and Denisovans had begun to diverge. b, Numerical counts of old and young introgressed variants in the SGDP human genomes. Young Denisovan variants are probably rare because the Altai Denisovan was not closely related to the Denisovan population that primarily interbred with humans. c, We computed the fraction of CRF introgression calls that occur as derived alleles in the Altai Neandertal genome and/or the Altai Denisovan genome. As expected, old variants are two- to fourfold more likely than young variants to occur in both archaic reference genomes (see extended Data Fig. 5a,b for more data on allele sharing between introgression calls and the reference archaic genomes). d, In contrast with the young archaic variation considered elsewhere in this paper, old archaic variation is not measurably depleted from enhancers—even enhancers active in numerous tissues. All error bars span 95% binomial confidence intervals; confidence intervals that do not intersect the dotted line (shown at an odds ratio of 1.0) indicate significant depletion of archaic variants relative to matched controls.
Fig. 5 |
Fig. 5 |. Rare variant enrichment reveals that enhancer sequences are weakly selectively constrained.
a, Theory predicts that the SFS becomes skewed towards rare variants by the action of purifying selection. b, In African data from the 1000 Genomes project, the enhancer SFS has a higher proportion of singletons compared with control regions adjacent to enhancers. c, every tissue type’s enhancer complement is enriched for singletons compared with adjacent control regions. This comparison of singleton enrichment odds ratios versus Denisovan depletion odds ratios shows that foetal brain, neurosphere cells and adult brain are outliers under stronger constraint. The yaxis has been split to accommodate the magnitude of singleton enrichment in exons and the vertical dotted line demarcates a Denisovan-to-control variant ratio of 1.0. error bars span two binomial test standard errors. d, Comparison of the singleton enrichment landscape with the Neandertal depletion landscape. e, enhancer pleiotropy is negatively correlated with singleton enrichment, although even enhancers of pleiotropy 1 have a singleton enrichment odds ratio significantly greater than 1. All error bars span 95% binomial confidence intervals.

Similar articles

Cited by

References

    1. Green RE et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010). - PMC - PubMed
    1. Prüfer K et al. The complete genome sequence of a Neanderthal from the Altai mountains. Nature 505, 43–49 (2014). - PMC - PubMed
    1. Vernot B et al. Excavating Neanderthal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016). - PMC - PubMed
    1. Chen L, Wolf A, Fu W, Li L & Akey J Identifying and interpreting apparent Neandertal ancestry in African individuals. Cell 180, 677–687 (2020). - PubMed
    1. Sankararaman S et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014). - PMC - PubMed

Publication types