Modeling splice sites with Bayes networks

Bioinformatics. 2000 Feb;16(2):152-8. doi: 10.1093/bioinformatics/16.2.152.

Abstract

Motivation: The main goal in this paper is to develop accurate probabilistic models for important functional regions in DNA sequences (e.g. splice junctions that signal the beginning and end of transcription in human DNA). These methods can subsequently be utilized to improve the performance of gene-finding systems. The models built here attempt to model long-distance dependencies between non-adjacent bases.

Results: An efficient modeling method is described which models biological data more accurately than a first-order Markov model without increasing the number of parameters. Intuitively, a small number of parameters helps a learning system to avoid overfitting. Several experiments with the model are presented, which show a small improvement in the average accuracy as compared with a simple Markov model. These experiments suggest that single long distance dependencies do not help the recognition problem, thus confirming several previous studies which have used more heuristic modeling techniques.

Availability: This software is available for downloaded and as a web resource at http://www.ai.uic.edu/software

Contact: kasif@eecs.uic.edu

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Computer Simulation*
  • DNA / analysis*
  • Humans
  • Models, Statistical*
  • Neural Networks, Computer*
  • RNA Splicing*
  • Software

Substances

  • DNA