Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 8 (6), R118

Fast-evolving Noncoding Sequences in the Human Genome

Affiliations

Fast-evolving Noncoding Sequences in the Human Genome

Christine P Bird et al. Genome Biol.

Abstract

Background: Gene regulation is considered one of the driving forces of evolution. Although protein-coding DNA sequences and RNA genes have been subject to recent evolutionary events in the human lineage, it has been hypothesized that the large phenotypic divergence between humans and chimpanzees has been driven mainly by changes in gene regulation rather than altered protein-coding gene sequences. Comparative analysis of vertebrate genomes has revealed an abundance of evolutionarily conserved but noncoding sequences. These conserved noncoding (CNC) sequences may well harbor critical regulatory variants that have driven recent human evolution.

Results: Here we identify 1,356 CNC sequences that appear to have undergone dramatic human-specific changes in selective pressures, at least 15% of which have substitution rates significantly above that expected under neutrality. The 1,356 'accelerated CNC' (ANC) sequences are enriched in recent segmental duplications, suggesting a recent change in selective constraint following duplication. In addition, single nucleotide polymorphisms within ANC sequences have a significant excess of high frequency derived alleles and high F(ST) values relative to controls, indicating that acceleration and positive selection are recent in human populations. Finally, a significant number of single nucleotide polymorphisms within ANC sequences are associated with changes in gene expression. The probability of variation in an ANC sequence being associated with a gene expression phenotype is fivefold higher than variation in a control CNC sequence.

Conclusion: Our analysis suggests that ANC sequences have until very recently played a role in human evolution, potentially through lineage-specific changes in gene regulation.

Figures

Figure 1
Figure 1
Substitution rates of 1,356 human-specific ANC sequences. Shown are the relative rates (P distance) of substitutions of (a) the 1,356 accelerated noncoding (ANC) sequences in the human (y-axis) and chimpanzee (x-axis) lineages, and (b) the 1,145 ANC sequences excluding those within potential confounding features (segmental duplications, copy number variants, pseudogenes, and retroposons).
Figure 2
Figure 2
Venn diagram of overlap between accelerated sequences in the three studies. The figure shows the overlap between the present study (yellow), the study by Pollard and coworkers [18] (green), and the study by Prabhakar and colleagues [19] (pink). ANC, accelerated noncoding; HAR, human accelerated region.
Figure 3
Figure 3
Segmental duplication divergence in ANC and CNC sequences. The figure shows that the divergence of paralogs in segmental duplications (SDs) where conserved noncoding (CNC) sequences (red) and power CNC sequences (purple) are found is skewed to high divergence values, whereas the accelerated noncoding (ANC) sequences (yellow) have a strong enrichment in recent segmental duplications, as expected if the acceleration is due to a recent change in selective forces (positive selection or loss of selective constraint).
Figure 4
Figure 4
Patterns and levels of nucleotide variation in ANC sequences. (a) The comparative derived allele frequency (DAF) spectrums for phase II HapMap single nucleotide polymorphisms (SNPs) in nonaccelerated conserved noncoding (CNC) sequences (n = 48,811), accelerated noncoding (ANC) sequences (n = 682), ANC sequences outside of segmental duplications, copy number variants (CNVs), retroposed genes or pseudogenes (n = 610), in the two controls (n = 28,408 and n = 28,722), in the power CNC sequences (n = 10,882), and in the 60 individuals of the Yoruban (YRI) population. (b) The comparative distributions of FST values for all phase II HapMap SNPs in ANC sequences (n = 688), ANC sequences outside of segmental duplications, CNVs, retroposed genes or pseudogenes (n = 620), power CNC sequences (n = 11,267), and nonaccelerated CNC sequences (n = 52,210).

Similar articles

See all similar articles

Cited by 57 PubMed Central articles

See all "Cited by" articles

References

    1. Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA. The evolution of transcriptional regulation in eukaryotes. Mol Biol Evol. 2003;20:1377–1419. doi: 10.1093/molbev/msg140. - DOI - PubMed
    1. King MC, Wilson AC. Evolution at two levels in humans and chimpanzees. Science. 1975;188:107–116. doi: 10.1126/science.1090005. - DOI - PubMed
    1. Dermitzakis ET, Clark AG. Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002;19:1114–1121. - PubMed
    1. Hardison RC. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 2000;16:369–372. doi: 10.1016/S0168-9525(00)02081-3. - DOI - PubMed
    1. Nobrega MA, Ovcharenko I, Afzal V, Rubin EM. Scanning human gene deserts for long-range enhancers. Science. 2003;302:413. doi: 10.1126/science.1088328. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback