A Bayesian framework for combining gene predictions

Bioinformatics. 2002 Jan;18(1):19-27. doi: 10.1093/bioinformatics/18.1.19.

Abstract

Motivation: Gene identification and gene discovery in new genomic sequences is one of the most timely computational questions addressed by bioinformatics scientists. This computational research has resulted in several systems that have been used successfully in many whole-genome analysis projects. As the number of such systems grows the need for a rigorous way to combine the predictions becomes more essential.

Results: In this paper we provide a Bayesian network framework for combining gene predictions from multiple systems. The framework allows us to treat the problem as combining the advice of multiple experts. Previous work in the area used relatively simple ideas such as majority voting. We introduce, for the first time, the use of hidden input/output Markov models for combining gene predictions. We apply the framework to the analysis of the Adh region in Drosophila that has been carefully studied in the context of gene finding and used as a basis for the GASP competition. The main challenge in combination of gene prediction programs is the fact that the systems are relying on similar features such as cod on usage and as a result the predictions are often correlated. We show that our approach is promising to improve the prediction accuracy and provides a systematic and flexible framework for incorporating multiple sources of evidence into gene prediction systems.

MeSH terms

  • Algorithms
  • Animals
  • Bayes Theorem*
  • Computational Biology
  • Drosophila / genetics
  • Expert Systems
  • Genes, Insect
  • Genetic Techniques*
  • Genomics / statistics & numerical data
  • Markov Chains
  • Software