Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(8):e43128.
doi: 10.1371/journal.pone.0043128. Epub 2012 Aug 27.

29 Mammalian Genomes Reveal Novel Exaptations of Mobile Elements for Likely Regulatory Functions in the Human Genome

Affiliations
Free PMC article

29 Mammalian Genomes Reveal Novel Exaptations of Mobile Elements for Likely Regulatory Functions in the Human Genome

Craig B Lowe et al. PLoS One. .
Free PMC article

Abstract

Recent research supports the view that changes in gene regulation, as opposed to changes in the genes themselves, play a significant role in morphological evolution. Gene regulation is largely dependent on transcription factor binding sites. Researchers are now able to use the available 29 mammalian genomes to measure selective constraint at the level of binding sites. This detailed map of constraint suggests that mammalian genomes co-opt fragments of mobile elements to act as gene regulatory sequence on a large scale. In the human genome we detect over 280,000 putative regulatory elements, totaling approximately 7 Mb of sequence, that originated as mobile element insertions. These putative regulatory regions are conserved non-exonic elements (CNEEs), which show considerable cross-species constraint and signatures of continued negative selection in humans, yet do not appear in a known mature transcript. These putative regulatory elements were co-opted from SINE, LINE, LTR and DNA transposon insertions. We demonstrate that at least 11%, and an estimated 20%, of gene regulatory sequence in the human genome showing cross-species conservation was co-opted from mobile elements. The location in the genome of CNEEs co-opted from mobile elements closely resembles that of CNEEs in general, except in the centers of the largest gene deserts where recognizable co-option events are relatively rare. We find that regions of certain mobile element insertions are more likely to be held under purifying selection than others. In particular, we show 6 examples where paralogous instances of an often co-opted mobile element region define a sequence motif that closely matches a transcription factor's binding profile.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The frequency of rare derived alleles is greater in CNEEs compared to neutral sites.
We compared the derived allele frequency spectra for CNEEs as a whole, CNEEs created through the co-option of mobile elements, protein-coding regions, and introns. The spectra representing CNEEs has a lower mean rank of derived allele frequencies, which is indicative of negative selection in humans (formula image, Mann-Whitney U test). However, the selection on these putative regulatory regions does not appear to be as high as that on coding regions (formula image, Mann-Whitney U test).
Figure 2
Figure 2. Exapted CNEEs and non-exapted CNEEs have similar length distributions.
We compared the entire length of CNEEs where at least half of the bases are annotated as originating in mobile element insertions with those CNEEs not meeting this criteria. The distributions are visually similar, yet have slightly different shapes (formula image, Kolmogorov-Smirnov test). The set of exapted elements has a lower mean length than the non-exapted set, 25 bp and 30 bp respectively, showing a slight depletion of mobile elements depositing very large CNEEs.
Figure 3
Figure 3. Exapted CNEEs and non-exapted CNEEs have similar distributions of constraint.
We calculated the rate of evolution for every CNEE, with respect to the neutral rate, using PhyloFit . The exapted elements evolve with a mean of 0.30 times the neutral rate, while the non-exapted set of CNEEs evolves at 0.32 times the neutral rate. The distributions are visually similarly yet have slightly different shapes (formula image, Kolmogorov-Smirnov test) with the exapted elements tending to evolve slightly slower.
Figure 4
Figure 4. Mobile elements co-opted as conserved non-exonic elements (CNEEs) are rarer than expected in gene deserts.
(A) We show the density of genes (blue), all CNEEs (red), just those CNEEs co-opted from mobile elements (green), and mobile elements (gray) windowed over 1 Mb intervals on the q arm of chromosome 16 where there are a number of gene-poor regions. Exaptations are less likely to occur in gene-poor areas when compared to CNEEs in general. (B) The difference between the density of CNEEs and that of exaptations is shown against a schematized backdrop of gene density. CNEEs have a greater normalized density in gene deserts and gene-poor regions of the genome compared to exaptations. In gene deserts, locations in the genome more than 1 Mb from the closest transcription start site, have a depletion of exaptations compared to the number of CNEEs (formula image, hypergeometric test).
Figure 5
Figure 5. Ancient CNEEs are more likely to be found far from transcription start sites.
We infer the branch of origin for all human CNEEs. For the CNEEs originating on each branch we calculate the percentage found more than 1 Mb from the closest transcription start site. Ancient CNEEs are twice as likely to be found far from genes compared to their younger counterparts. Periods: Devonian, Carboniferous, Permian, Triassic, Jurassic, Cretaceous, Paleogene, Neogene.
Figure 6
Figure 6. Contribution of mobile element classes, superfamilies, and families.
We plotted the number of CNEEs exapted from each repeat class and superfamily, as well as the top contributing families. The superfamilies and families are colored to match the class they belong to. LINE insertions are the class that is creating the most putative regulatory elements. This class is largely composed of the L1 and L2 superfamilies, which have both made large contributions. There is not much statistical power to identify recently inserted sequence as conserved. For this reason, the amount of functional sequence contributed by mobile element superfamilies with recently active members will be an underestimate.
Figure 7
Figure 7. Contribution of mobile element classes, superfamilies, and families relative to their abundance.
We plotted the number of exapted instances per genomic instance for classes and superfamilies, as well as the top ranked mobile element families. We colored each superfamily and family to represent the class to which it belongs. Mobile element superfamilies with recently active members will have their contribution underestimated. This is due to the limited statistical power to detect regions evolving under purifying selection when only a few closely related orthologs are available.
Figure 8
Figure 8. Paralogous instances of mobile elements show selective pressures matching transcription factor binding preferences.
We hypothesized that when a particular region of the mobile element is repeatedly exapted, it may be used to perform the same function in paralogous instances. We collected sequences in the human genome representing families of paralogs, that all originated from the same bases of a mobile element insertion. We used MEME to define a motif for this family that represents the selective pressure acting on these insertions after their exaptation by the host. In 6 cases this motif has a significant match to the binding preference of transcription factors (p-values are corrected for multiple tests). These results are consistent with mobile element consensus sequences spreading functional, or near-functional, transcription factor binding sites throughout the genome, which are then exapted by the host. A more detailed analysis of one of these matches is shown in Figure 9.
Figure 9
Figure 9. L1MC4 may be a fecund source of octamer binding sites.
The probability density of each base in the L1MC4 consensus being present in a genomic copy (gray) or an exapted copy (red) is plotted (top plot). When zooming in to the second highest peak of exaptation probability we show the consensus sequence as well. By using motif finding software on the exaptation events in the extant human genome that contributed to this peak, we obtained a profile describing the selection acting on paralogous exaptations of this small region. This profile is easily alignable to the consensus, but it is interesting to note the ‘C’ in the consensus (bold type) that routinely changes to a ‘T’ in the exaptations. The profile describing the selective pressure acting on these paralogs is similar to the octamer binding profile, which is consistent with this section of the L1MC4 consensus often being exapted on the human lineage to act as a binding site for a member of the octamer family of proteins.
Figure 10
Figure 10. Phylogenetic tree of 29 placental mammals including some outgroup species.
We used the topology from the 2x Mammals Consortium . We have included opossum, platypus, chicken, frog, and tetraodon as outgroup species.

Similar articles

See all similar articles

Cited by 32 articles

See all "Cited by" articles

References

    1. Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478: 476–482. - PMC - PubMed
    1. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520–562. - PubMed
    1. Chiaromonte F, Weber RJ, Roskin KM, Diekhans M, Kent WJ, et al. (2003) The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harb Symp Quant Biol 68: 245–254. - PubMed
    1. Nobrega MA, Ovcharenko I, Afzal V, Rubin EM (2003) Scanning human gene deserts for long-range enhancers. Science 302: 413. - PubMed
    1. Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, et al. (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3: e7. - PMC - PubMed

Publication types

Substances

Feedback