Computational gene prediction using multiple sources of evidence

Genome Res. 2004 Jan;14(1):142-8. doi: 10.1101/gr.1562804.

Abstract

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Arabidopsis / genetics
  • Base Composition / genetics
  • Brassica / genetics
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • DNA, Complementary / genetics
  • DNA, Plant / genetics
  • Genes, Plant / genetics*
  • Humans
  • Models, Genetic
  • Predictive Value of Tests
  • Software / statistics & numerical data

Substances

  • DNA, Complementary
  • DNA, Plant