Evolutionarily emerged G tracts between the polypyrimidine tract and 3' AG are splicing silencers enriched in genes involved in cancer

BMC Genomics. 2014 Dec 19;15(1):1143. doi: 10.1186/1471-2164-15-1143.


Background: The 3' splice site (SS) at the end of pre-mRNA introns has a consensus sequence (Y)nNYAG for constitutive splicing of mammalian genes. Deviation from this consensus could change or interrupt the usage of the splice site leading to alternative or aberrant splicing, which could affect normal cell function or even the development of diseases. We have shown that the position "N" can be replaced by a CA-rich RNA element called CaRRE1 to regulate the alternative splicing of a group of genes.

Results: Taking it a step further, we searched the human genome for purine-rich elements between the -3 and -10 positions of the 3' splice sites of annotated introns. This identified several thousand such 3'SS; more than a thousand of them contain at least one copy of G tract. These sites deviate significantly from the consensus of constitutive splice sites and are highly associated with alterative splicing events, particularly alternative 3' splice and intron retention. We show by mutagenesis analysis and RNA interference that the G tracts are splicing silencers and a group of the associated exons are controlled by the G tract binding proteins hnRNP H/F. Species comparison of a group of the 3'SS among vertebrates suggests that most (~87%) of the G tracts emerged in ancestors of mammals during evolution. Moreover, the host genes are most significantly associated with cancer.

Conclusion: We call these elements together with CaRRE1 regulatory RNA elements between the Py and 3'AG (REPA). The emergence of REPA in this highly constrained region indicates that this location has been remarkably permissive for the emergence of de novo regulatory RNA elements, even purine-rich motifs, in a large group of mammalian genes during evolution. This evolutionary change controls alternative splicing, likely to diversify proteomes for particular cellular functions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing
  • Animals
  • Base Sequence
  • Consensus Sequence
  • Evolution, Molecular*
  • GC Rich Sequence*
  • Gene Silencing*
  • Genes, Neoplasm / genetics*
  • Genomics
  • Humans
  • Mutation
  • Neoplasms / genetics*
  • RNA Splice Sites / genetics*
  • RNA Splicing / genetics*


  • RNA Splice Sites