Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes

Proc Natl Acad Sci U S A. 2011 Jul 5;108(27):11093-8. doi: 10.1073/pnas.1101135108. Epub 2011 Jun 17.


We present an intuitive strategy for predicting the effect of sequence variation on splicing. In contrast to transcriptional elements, splicing elements appear to be strongly position dependent. We demonstrated that exonic binding of the normally intronic splicing factor, U2AF65, inhibits splicing. Reasoning that the positional distribution of a splicing element is a signature of its function, we developed a method for organizing all possible sequence motifs into clusters based on the genomic profile of their positional distribution around splice sites. Binding sites for serine/arginine rich (SR) proteins tended to be exonic whereas heterogeneous ribonucleoprotein (hnRNP) recognition elements were mostly intronic. In addition to the known elements, novel motifs were returned and validated. This method was also predictive of splicing mutations. A mutation in a motif creates a new motif that sometimes has a similar distribution shape to the original motif and sometimes has a different distribution. We created an intraallelic distance measure to capture this property and found that mutations that created large intraallelic distances disrupted splicing in vivo whereas mutations with small distances did not alter splicing. Analyzing the dataset of human disease alleles revealed known splicing mutants to have high intraallelic distances and suggested that 22% of disease alleles that were originally classified as missense mutations may also affect splicing. This category together with mutations in the canonical splicing signals suggest that approximately one third of all disease-causing mutations alter pre-mRNA splicing.

MeSH terms

  • Algorithms
  • Alleles
  • Base Sequence
  • Cluster Analysis
  • Exons
  • Genetic Variation
  • Humans
  • Mutation
  • Nuclear Proteins / metabolism
  • RNA Precursors / genetics*
  • RNA Precursors / metabolism*
  • RNA Splicing / genetics*
  • Ribonucleoproteins / metabolism
  • Splicing Factor U2AF


  • Nuclear Proteins
  • RNA Precursors
  • Ribonucleoproteins
  • Splicing Factor U2AF
  • U2AF2 protein, human