Benchmarking template selection and model quality assessment for high-resolution comparative modeling

Proteins. 2007 Nov 15;69(3):476-85. doi: 10.1002/prot.21531.

Abstract

Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30-80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI-BLAST, profile-profile alignment, HHpred HMM-HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 A or more) could be made. No significant improvement is found for any of the more sophisticated sequence-based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35-40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence-template sequence similarity that a extra 7% of "best" models can be found.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Models, Molecular*
  • Protein Conformation*
  • Protein Structure, Tertiary
  • Sequence Analysis, Protein