Gene prediction with a hidden Markov model and a new intron submodel
- PMID: 14534192
- DOI: 10.1093/bioinformatics/btg1080
Gene prediction with a hidden Markov model and a new intron submodel
Abstract
Motivation: The problem of finding the genes in eukaryotic DNA sequences by computational methods is still not satisfactorily solved. Gene finding programs have achieved relatively high accuracy on short genomic sequences but do not perform well on longer sequences with an unknown number of genes in them. Here existing programs tend to predict many false exons.
Results: We have developed a new program, AUGUSTUS, for the ab initio prediction of protein coding genes in eukaryotic genomes. The program is based on a Hidden Markov Model and integrates a number of known methods and submodels. It employs a new way of modeling intron lengths. We use a new donor splice site model, a new model for a short region directly upstream of the donor splice site model that takes the reading frame into account and apply a method that allows better GC-content dependent parameter estimation. AUGUSTUS predicts on longer sequences far more human and drosophila genes accurately than the ab initio gene prediction programs we compared it with, while at the same time being more specific.
Availability: A web interface for AUGUSTUS and the executable program are located at http://augustus.gobics.de.
Similar articles
-
AUGUSTUS: a web server for gene finding in eukaryotes.Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12. doi: 10.1093/nar/gkh379. Nucleic Acids Res. 2004. PMID: 15215400 Free PMC article.
-
Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources.BMC Bioinformatics. 2006 Feb 9;7:62. doi: 10.1186/1471-2105-7-62. BMC Bioinformatics. 2006. PMID: 16469098 Free PMC article.
-
Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants.Bioinformatics. 2005 Nov 1;21 Suppl 3:iii20-30. doi: 10.1093/bioinformatics/bti1205. Bioinformatics. 2005. PMID: 16306388
-
Finding cis-regulatory modules in Drosophila using phylogenetic hidden Markov models.Bioinformatics. 2007 Aug 15;23(16):2031-7. doi: 10.1093/bioinformatics/btm299. Epub 2007 Jun 5. Bioinformatics. 2007. PMID: 17550911
-
Computational methods for ab initio and comparative gene finding.Methods Mol Biol. 2010;609:269-84. doi: 10.1007/978-1-60327-241-4_16. Methods Mol Biol. 2010. PMID: 20221925 Review.
Cited by
-
Cotton D genome assemblies built with long-read data unveil mechanisms of centromere evolution and stress tolerance divergence.BMC Biol. 2021 Jun 3;19(1):115. doi: 10.1186/s12915-021-01041-0. BMC Biol. 2021. PMID: 34082735 Free PMC article.
-
The draft genome of the Temminck's tragopan (Tragopan temminckii) with evolutionary implications.BMC Genomics. 2023 Dec 7;24(1):751. doi: 10.1186/s12864-023-09857-6. BMC Genomics. 2023. PMID: 38062370 Free PMC article.
-
The wheat powdery mildew genome shows the unique evolution of an obligate biotroph.Nat Genet. 2013 Sep;45(9):1092-6. doi: 10.1038/ng.2704. Epub 2013 Jul 14. Nat Genet. 2013. PMID: 23852167
-
Two High-Quality Cygnus Genome Assemblies Reveal Genomic Variations Associated with Plumage Color.Int J Mol Sci. 2023 Nov 29;24(23):16953. doi: 10.3390/ijms242316953. Int J Mol Sci. 2023. PMID: 38069278 Free PMC article.
-
Chromosome-level Genomes Reveal the Genetic Basis of Descending Dysploidy and Sex Determination in Morus Plants.Genomics Proteomics Bioinformatics. 2022 Dec;20(6):1119-1137. doi: 10.1016/j.gpb.2022.08.005. Epub 2022 Aug 30. Genomics Proteomics Bioinformatics. 2022. PMID: 36055564 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous
