Assessing the fraction of short-distance tandem splice sites under purifying selection

RNA. 2008 Apr;14(4):616-29. doi: 10.1261/rna.883908. Epub 2008 Feb 11.


Many alternative splice events result in subtle mRNA changes, and most of them occur at short-distance tandem donor and acceptor sites. The splicing mechanism of such tandem sites likely involves the stochastic selection of either splice site. While tandem splice events are frequent, it is unknown how many are functionally important. Here, we use phylogenetic conservation to address this question, focusing on tandems with a distance of 3-9 nucleotides. We show that previous contradicting results on whether alternative or constitutive tandem motifs are more conserved between species can be explained by a statistical paradox (Simpson's paradox). Applying methods that take biases into account, we found higher conservation of alternative tandems in mouse, dog, and even chicken, zebrafish, and Fugu genomes. We estimated a lower bound for the number of alternative sites that are under purifying (negative) selection. While the absolute number of conserved tandem motifs decreases with the evolutionary distance, the fraction under selection increases. Interestingly, a number of frameshifting tandems are under selection, suggesting a role in regulating mRNA and protein levels via nonsense-mediated decay (NMD). An analysis of the intronic flanks shows that purifying selection also acts on the intronic sequence. We propose that stochastic splice site selection can be an advantageous mechanism that allows constant splice variant ratios in situations where a deviation in this ratio is deleterious.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing*
  • Animals
  • Base Sequence
  • Chickens
  • Computational Biology
  • Conserved Sequence
  • Databases, Nucleic Acid
  • Dogs
  • Expressed Sequence Tags
  • Humans
  • Introns
  • Macaca mulatta
  • Mice
  • Phylogeny
  • RNA Splice Sites*
  • RNA, Messenger / genetics
  • Selection, Genetic
  • Tandem Repeat Sequences


  • RNA Splice Sites
  • RNA, Messenger