Multiple-sequence functional annotation and the generalized hidden Markov phylogeny

Bioinformatics. 2004 Aug 12;20(12):1850-60. doi: 10.1093/bioinformatics/bth153. Epub 2004 Feb 26.

Abstract

Motivation: Phylogenetic shadowing is a comparative genomics principle that allows for the discovery of conserved regions in sequences from multiple closely related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies to a variety of situations where functional regions are to be inferred from evolutionary constraints.

Results: We show how GHMPs can be used to predict complete shared gene structures in multiple primate sequences. We also describe shadower, our implementation of such a prediction system. We find that shadower outperforms previously reported ab initio gene finders, including comparative human-mouse approaches, on a small sample of diverse exonic regions. Finally, we report on an empirical analysis of shadower's performance which reveals that as few as five well-chosen species may suffice to attain maximal sensitivity and specificity in exon demarcation.

Availability: A Web server is available at http://bonaire.lbl.gov/shadower

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Chromosome Mapping / methods*
  • Evolution, Molecular*
  • Gene Expression Profiling / methods*
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical
  • Phylogeny
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Software