Evaluation of structural and evolutionary contributions to deleterious mutation prediction

J Mol Biol. 2002 Sep 27;322(4):891-901. doi: 10.1016/s0022-2836(02)00813-6.


Methods for automated prediction of deleterious protein mutations have utilized both structural and evolutionary information but the relative contribution of these two factors remains unclear. To address this, we have used a variety of structural and evolutionary features to create simple deleterious mutation models that have been tested on both experimental mutagenesis and human allele data. We find that the most accurate predictions are obtained using a solvent-accessibility term, the C(beta) density, and a score derived from homologous sequences, SIFT. A classification tree using these two features has a cross-validated prediction error of 20.5% on an experimental mutagenesis test set when the prior probability for deleterious and neutral cases is equal, whereas this prediction error is 28.8% and 22.2% using either the C(beta) density or SIFT alone. The improvement imparted by structure increases when fewer homologs are available: when restricted to three homologs the prediction error improves from 26.9% using SIFT alone to 22.4% using SIFT and the C(beta) density, or 24.8% using SIFT and a noisy C(beta) density term approximating the inaccuracy of ab initio structures modeled by the Rosetta method. We conclude that methods for deleterious mutation prediction should include structural information when fewer than five to ten homologs are available, and that ab initio predicted structures may soon be useful in such cases when high-resolution structures are unavailable.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins*
  • Bacteriophage T4 / enzymology
  • Escherichia coli Proteins / genetics
  • Evolution, Molecular*
  • HIV Protease / genetics
  • Humans
  • Lac Repressors
  • Models, Genetic*
  • Muramidase / genetics
  • Nonlinear Dynamics*
  • Protein Conformation
  • Repressor Proteins / genetics
  • Sequence Deletion*


  • Bacterial Proteins
  • Escherichia coli Proteins
  • Lac Repressors
  • Repressor Proteins
  • Muramidase
  • HIV Protease