Efficient decoding algorithms for generalized hidden Markov model gene finders
- PMID: 15667658
- PMCID: PMC552317
- DOI: 10.1186/1471-2105-6-16
Efficient decoding algorithms for generalized hidden Markov model gene finders
Abstract
Background: The Generalized Hidden Markov Model (GHMM) has proven a useful framework for the task of computational gene prediction in eukaryotic genomes, due to its flexibility and probabilistic underpinnings. As the focus of the gene finding community shifts toward the use of homology information to improve prediction accuracy, extensions to the basic GHMM model are being explored as possible ways to integrate this homology information into the prediction process. Particularly prominent among these extensions are those techniques which call for the simultaneous prediction of genes in two or more genomes at once, thereby increasing significantly the computational cost of prediction and highlighting the importance of speed and memory efficiency in the implementation of the underlying GHMM algorithms. Unfortunately, the task of implementing an efficient GHMM-based gene finder is already a nontrivial one, and it can be expected that this task will only grow more onerous as our models increase in complexity.
Results: As a first step toward addressing the implementation challenges of these next-generation systems, we describe in detail two software architectures for GHMM-based gene finders, one comprising the common array-based approach, and the other a highly optimized algorithm which requires significantly less memory while achieving virtually identical speed. We then show how both of these architectures can be accelerated by a factor of two by optimizing their content sensors. We finish with a brief illustration of the impact these optimizations have had on the feasibility of our new homology-based gene finder, TWAIN.
Conclusions: In describing a number of optimizations for GHMM-based gene finders and making available two complete open-source software systems embodying these methods, it is our hope that others will be more enabled to explore promising extensions to the GHMM framework, thereby improving the state-of-the-art in gene prediction techniques.
Figures
Similar articles
-
An empirical analysis of training protocols for probabilistic gene finders.BMC Bioinformatics. 2004 Dec 21;5:206. doi: 10.1186/1471-2105-5-206. BMC Bioinformatics. 2004. PMID: 15613242 Free PMC article.
-
TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.Bioinformatics. 2004 Nov 1;20(16):2878-9. doi: 10.1093/bioinformatics/bth315. Epub 2004 May 14. Bioinformatics. 2004. PMID: 15145805
-
Integrating database homology in a probabilistic gene structure model.Pac Symp Biocomput. 1997:232-44. Pac Symp Biocomput. 1997. PMID: 9390295
-
Computational approaches to gene prediction.J Microbiol. 2006 Apr;44(2):137-44. J Microbiol. 2006. PMID: 16728949 Review.
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
Cited by
-
Duration learning for analysis of nanopore ionic current blockades.BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2105-8-S7-S14. BMC Bioinformatics. 2007. PMID: 18047713 Free PMC article.
-
JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.Genome Biol. 2006;7 Suppl 1(Suppl 1):S9.1-13. doi: 10.1186/gb-2006-7-s1-s9. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925843 Free PMC article.
-
Improved transcript isoform discovery using ORF graphs.Bioinformatics. 2014 Jul 15;30(14):1958-64. doi: 10.1093/bioinformatics/btu160. Epub 2014 Mar 22. Bioinformatics. 2014. PMID: 24659106 Free PMC article.
-
High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE.Bioinformatics. 2017 May 15;33(10):1437-1446. doi: 10.1093/bioinformatics/btw799. Bioinformatics. 2017. PMID: 28011790 Free PMC article.
-
An evaluation of contemporary hidden Markov model genefinders with a predicted exon taxonomy.Nucleic Acids Res. 2007;35(1):317-24. doi: 10.1093/nar/gkl1026. Epub 2006 Dec 14. Nucleic Acids Res. 2007. PMID: 17170005 Free PMC article.
References
-
- Kulp D, Haussler D, Reese MG, Eeckman FH. A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Con Intell Syst Mol Biol. 1996;4:134–142. - PubMed
-
- Burge C. PhD Thesis. Department of Mathematics, Stanford University; 1997. Identification of Genes in Human Genomic DNA.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous
