Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm

Proteins. 2004 Aug 15;56(3):502-18. doi: 10.1002/prot.20106.

Abstract

This article describes the PROSPECTOR_3 threading algorithm, which combines various scoring functions designed to match structurally related target/template pairs. Each variant described was found to have a Z-score above which most identified templates have good structural (threading) alignments, Z(struct) (Z(good)). 'Easy' targets with accurate threading alignments are identified as single templates with Z > Z(good) or two templates, each with Z > Z(struct), having a good consensus structure in mutually aligned regions. 'Medium' targets have a pair of templates lacking a consensus structure, or a single template for which Z(struct) < Z < Z(good). PROSPECTOR_3 was applied to a comprehensive Protein Data Bank (PDB) benchmark composed of 1491 single domain proteins, 41-200 residues long and no more than 30% identical to any threading template. Of the proteins, 878 were found to be easy targets, with 761 having a root mean square deviation (RMSD) from native of less than 6.5 A. The average contact prediction accuracy was 46%, and on average 17.6 residue continuous fragments were predicted with RMSD values of 2.0 A. There were 606 medium targets identified, 87% (31%) of which had good structural (threading) alignments. On average, 9.1 residue, continuous fragments with RMSD of 2.5 A were predicted. Combining easy and medium sets, 63% (91%) of the targets had good threading (structural) alignments compared to native; the average target/template sequence identity was 22%. Only nine targets lacked matched templates. Moreover, PROSPECTOR_3 consistently outperforms PSIBLAST. Similar results were predicted for open reading frames (ORFS) < or =200 residues in the M. genitalium, E. coli and S. cerevisiae genomes. Thus, progress has been made in identification of weakly homologous/analogous proteins, with very high alignment coverage, both in a comprehensive PDB benchmark as well as in genomes.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Computer Simulation
  • Databases, Protein
  • Models, Chemical
  • Open Reading Frames
  • Peptide Library
  • Protein Structure, Secondary
  • Protein Structure, Tertiary* / genetics
  • Sequence Alignment
  • Software*
  • Structural Homology, Protein*

Substances

  • Bacterial Proteins
  • Peptide Library