Comprehensive splice-site analysis using comparative genomics

Nucleic Acids Res. 2006;34(14):3955-67. doi: 10.1093/nar/gkl556. Epub 2006 Aug 12.

Abstract

We have collected over half a million splice sites from five species-Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana-and classified them into four subtypes: U2-type GT-AG and GC-AG and U12-type GT-AG and AT-AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT-AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3' splice sites (3'ss) and (iv) distinct evolutionary histories of 5' and 3'ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arabidopsis / genetics
  • Base Sequence
  • Caenorhabditis elegans / genetics
  • Conserved Sequence
  • Databases, Nucleic Acid
  • Drosophila melanogaster / genetics
  • Evolution, Molecular
  • Genomics / methods*
  • Humans
  • Internet
  • Introns
  • Phylogeny
  • RNA Splice Sites*
  • Software

Substances

  • RNA Splice Sites