Genome pool strategy for structural coverage of protein families

Structure. 2008 Nov 12;16(11):1659-67. doi: 10.1016/j.str.2008.08.018.


Even closely homologous proteins often have different crystallization properties and propensities. This observation can be used to introduce an additional dimension into crystallization trials by simultaneous targeting multiple homologs in what we call a "genome pool" strategy. We show that this strategy works because protein physicochemical properties correlated with crystallization success have a surprisingly broad distribution within most protein families. There are also "easy" and "difficult" families where this distribution is tilted in one direction. This leads to uneven structural coverage of protein families, with more "easy" ones solved. Increasing the size of the "genome pool" can improve chances of solving the "difficult" ones. In contrast, our analysis does not indicate that any specific genomes are "easy" or "difficult". Finally, we show that the group of proteins with known 3D structures is systematically different from the general pool of known proteins and we assess the structural consequences of these differences.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Amino Acid Sequence
  • Archaea / genetics
  • Bacteria / genetics
  • Crystallography, X-Ray
  • Databases, Protein
  • Gene Pool*
  • Genome
  • Probability
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics*
  • Sequence Alignment
  • Sequence Homology, Amino Acid
  • Species Specificity


  • Proteins