The challenge of protein structure determination--lessons from structural genomics

Protein Sci. 2007 Nov;16(11):2472-82. doi: 10.1110/ps.073037907.


The process of experimental determination of protein structure is marred with a high ratio of failures at many stages. With availability of large quantities of data from high-throughput structure determination in structural genomics centers, we can now learn to recognize protein features correlated with failures; thus, we can recognize proteins more likely to succeed and eventually learn how to modify those that are less likely to succeed. Here, we identify several protein features that correlate strongly with successful protein production and crystallization and combine them into a single score that assesses "crystallization feasibility." The formula derived here was tested with a jackknife procedure and validated on independent benchmark sets. The "crystallization feasibility" score described here is being applied to target selection in the Joint Center for Structural Genomics, and is now contributing to increasing the success rate, lowering the costs, and shortening the time for protein structure determination. Analyses of PDB depositions suggest that very similar features also play a role in non-high-throughput structure determination, suggesting that this crystallization feasibility score would also be of significant interest to structural biology, as well as to molecular and biochemistry laboratories.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Crystallization
  • Crystallography, X-Ray / methods*
  • Databases, Protein
  • Genomics / methods
  • Isoelectric Focusing
  • Magnetic Resonance Spectroscopy / methods
  • Probability
  • Protein Conformation
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteomics / methods*
  • Sequence Analysis, Protein


  • Proteins