Aberrant 5' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization

Nucleic Acids Res. 2007;35(13):4250-63. doi: 10.1093/nar/gkm402. Epub 2007 Jun 18.


Despite a growing number of splicing mutations found in hereditary diseases, utilization of aberrant splice sites and their effects on gene expression remain challenging to predict. We compiled sequences of 346 aberrant 5'splice sites (5'ss) that were activated by mutations in 166 human disease genes. Mutations within the 5'ss consensus accounted for 254 cryptic 5'ss and mutations elsewhere activated 92 de novo 5'ss. Point mutations leading to cryptic 5'ss activation were most common in the first intron nucleotide, followed by the fifth nucleotide. Substitutions at position +5 were exclusively G>A transitions, which was largely attributable to high mutability rates of C/G>T/A. However, the frequency of point mutations at position +5 was significantly higher than that observed in the Human Gene Mutation Database, suggesting that alterations of this position are particularly prone to aberrant splicing, possibly due to a requirement for sequential interactions with U1 and U6 snRNAs. Cryptic 5'ss were best predicted by computational algorithms that accommodate nucleotide dependencies and not by weight-matrix models. Discrimination of intronic 5'ss from their authentic counterparts was less effective than for exonic sites, as the former were intrinsically stronger than the latter. Computational prediction of exonic de novo 5'ss was poor, suggesting that their activation critically depends on exonic splicing enhancers or silencers. The authentic counterparts of aberrant 5'ss were significantly weaker than the average human 5'ss. The development of an online database of aberrant 5'ss will be useful for studying basic mechanisms of splice-site selection, identifying splicing mutations and optimizing splice-site prediction algorithms.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Computational Biology / methods*
  • Consensus Sequence
  • DNA Mutational Analysis*
  • Databases, Nucleic Acid
  • Genetic Diseases, Inborn / genetics*
  • Humans
  • Mutation*
  • Nucleotides / chemistry
  • Point Mutation
  • RNA Splice Sites*
  • Sequence Alignment
  • Software


  • Nucleotides
  • RNA Splice Sites