Hidden Markov models for detecting remote protein homologies
- PMID: 9927713
- DOI: 10.1093/bioinformatics/14.10.846
Hidden Markov models for detecting remote protein homologies
Abstract
Motivation: A new hidden Markov model method (SAM-T98) for finding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (HMM) from the sequence and homologs found using the HMM for database search. SAM-T98 is also used to construct model libraries automatically from sequences in structural databases.
Methods: We evaluate the SAM-T98 method with four datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against WU-BLASTP and against DOUBLE-BLAST, a two-step method similar to ISS, but using BLAST instead of FASTA.
Results: SAM-T98 had the fewest errors in all tests-dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP (Structural Classification of Proteins)-domains test, SAM-T98 got 880 true positives and 68 false positives, DOUBLE-BLAST got 533 true positives with 71 false positives, and WU-BLASTP got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to find family or fold relationships. One key to the performance of the HMM method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model.
Availability: A World Wide Web server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite, can be found at http://www.cse.ucsc.edu/research/compbi o/
Contact: karplus@cse.ucsc.edu; http://www.cse.ucsc.edu/karplus
Similar articles
-
Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set.Bioinformatics. 2001 Aug;17(8):713-20. doi: 10.1093/bioinformatics/17.8.713. Bioinformatics. 2001. PMID: 11524372
-
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.J Mol Biol. 1998 Dec 11;284(4):1201-10. doi: 10.1006/jmbi.1998.2221. J Mol Biol. 1998. PMID: 9837738
-
Reduced space hidden Markov model training.Bioinformatics. 1998 Jun;14(5):401-6. doi: 10.1093/bioinformatics/14.5.401. Bioinformatics. 1998. PMID: 9682053
-
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.Nucleic Acids Res. 2001 Jul 15;29(14):2994-3005. doi: 10.1093/nar/29.14.2994. Nucleic Acids Res. 2001. PMID: 11452024 Free PMC article. Review.
-
Profile hidden Markov models.Bioinformatics. 1998;14(9):755-63. doi: 10.1093/bioinformatics/14.9.755. Bioinformatics. 1998. PMID: 9918945 Review.
Cited by
-
nail: software for high-speed, high-sensitivity protein sequence annotation.bioRxiv [Preprint]. 2024 Jan 30:2024.01.27.577580. doi: 10.1101/2024.01.27.577580. bioRxiv. 2024. PMID: 38352323 Free PMC article. Preprint.
-
New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions.Front Bioinform. 2023 Oct 12;3:1227193. doi: 10.3389/fbinf.2023.1227193. eCollection 2023. Front Bioinform. 2023. PMID: 37900964 Free PMC article.
-
Identification of Structural and Morphogenesis Genes of Sulfitobacter Phage ΦGT1 and Placement within the Evolutionary History of the Podoviruses.Viruses. 2023 Jun 29;15(7):1475. doi: 10.3390/v15071475. Viruses. 2023. PMID: 37515163 Free PMC article.
-
Transfer of knowledge from model organisms to evolutionarily distant non-model organisms: The coral Pocillopora damicornis membrane signaling receptome.PLoS One. 2023 Feb 3;18(2):e0270965. doi: 10.1371/journal.pone.0270965. eCollection 2023. PLoS One. 2023. PMID: 36735673 Free PMC article.
-
Identification and expression analysis of GARP superfamily genes in response to nitrogen and phosphorus stress in Spirodela polyrhiza.BMC Plant Biol. 2022 Jun 25;22(1):308. doi: 10.1186/s12870-022-03696-5. BMC Plant Biol. 2022. PMID: 35751022 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
