Tertiary structural propensities reveal fundamental sequence/structure relationships

Structure. 2015 May 5;23(5):961-971. doi: 10.1016/j.str.2015.03.015. Epub 2015 Apr 23.

Abstract

Extracting useful generalizations from the continually growing Protein Data Bank (PDB) is of central importance. We hypothesize that the PDB contains valuable quantitative information on the level of local tertiary structural motifs (TERMs). We show that by breaking a protein structure into its constituent TERMs, and querying the PDB to characterize the natural ensemble matching each, we can estimate the compatibility of the structure with a given amino acid sequence through a metric we term "structure score." Considering submissions from recent Critical Assessment of Structure Prediction (CASP) experiments, we found a strong correlation (R = 0.69) between structure score and model accuracy, with poorly predicted regions readily identifiable. This performance exceeds that of leading atomistic statistical energy functions. Furthermore, TERM-based analysis of two prototypical multi-state proteins rapidly produced structural insights fully consistent with prior extensive experimental studies. We thus find that TERM-based analysis should have considerable utility for protein structural biology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Databases, Protein
  • Models, Molecular
  • Models, Statistical
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Sequence Analysis, Protein

Substances

  • Proteins