Pre-mRNA secondary structure prediction aids splice site prediction

Pac Symp Biocomput. 2002:223-34.

Abstract

Accurate splice site prediction is a critical component of any computational approach to gene prediction in higher organisms. Existing approaches generally use sequence-based models that capture local dependencies among nucleotides in a small window around the splice site. We present evidence that computationally predicted secondary structure of moderate length pre-mRNA subsequencies contains information that can be exploited to improve acceptor splice site prediction beyond that possible with conventional sequence-based approaches. Both decision tree and support vector machine classifiers, using folding energy and structure metrics characterizing helix formation near the splice site, achieve a 5-10% reduction in error rate with a human data set. Based on our data, we hypothesize that acceptors preferentially exhibit short helices at the splice site.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computer Simulation
  • Exons
  • Likelihood Functions
  • Models, Genetic
  • Nucleic Acid Conformation*
  • RNA Precursors / chemistry*
  • RNA Splicing*
  • RNA, Messenger / chemistry*
  • RNA, Messenger / genetics
  • Reproducibility of Results
  • Sequence Analysis, RNA
  • Software

Substances

  • RNA Precursors
  • RNA, Messenger