Can computationally designed protein sequences improve secondary structure prediction?

Protein Eng Des Sel. 2011 May;24(5):455-61. doi: 10.1093/protein/gzr003. Epub 2011 Jan 31.


Computational sequence design methods are used to engineer proteins with desired properties such as increased thermal stability and novel function. In addition, these algorithms can be used to identify an envelope of sequences that may be compatible with a particular protein fold topology. In this regard, we hypothesized that sequence-property prediction, specifically secondary structure, could be significantly enhanced by using a large database of computationally designed sequences. We performed a large-scale test of this hypothesis with 6511 diverse protein domains and 50 designed sequences per domain. After analysis of the inherent accuracy of the designed sequences database, we realized that it was necessary to put constraints on what fraction of the native sequence should be allowed to change. With mutational constraints, accuracy was improved vs. no constraints, but the diversity of designed sequences, and hence effective size of the database, was moderately reduced. Overall, the best three-state prediction accuracy (Q(3)) that we achieved was nearly a percentage point improved over using a natural sequence database alone, well below the theoretical possibility for improvement of 8-10 percentage points. Furthermore, our nascent method was used to augment the state-of-the-art PSIPRED program by a percentage point.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology / methods*
  • Databases, Protein
  • Fuzzy Logic
  • Neural Networks, Computer
  • Protein Engineering / methods*
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics*
  • Proteins / metabolism


  • Proteins