Prediction of protein crystallization using collocation of amino acid pairs

Biochem Biophys Res Commun. 2007 Apr 13;355(3):764-9. doi: 10.1016/j.bbrc.2007.02.040. Epub 2007 Feb 15.

Abstract

While above 80% of protein structures in PDB were determined using X-ray crystallography, in some cases only 42% of soluble purified proteins yield crystals. Since experimental verification of protein's ability to crystallize is relatively expensive and time-consuming, we propose a new in silico prediction system, called CRYSTALP, which is based on the protein's sequence. CRYSTALP uses a novel feature-based sequence representation and applies a Naïve Bayes classifier. It was compared with recent, competing in silico method, SECRET [P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins 62 (2) (2006) 343-355], and other state-of-the-art classifiers. Based on experimental tests, CRYSTALP is shown to predict crystallization with 77.5% accuracy, which is better by over 10% than the SECRET's accuracy, and better than accuracy of the other considered classifiers. CRYSTALP uses different and over 50% less features to represent sequences than SECRET. Additionally, features used by CRYSTALP may help to discover intra-molecular markers that influence protein crystallization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Computational Biology / methods*
  • Crystallization
  • Databases, Protein
  • Protein Conformation
  • Proteins / chemistry*
  • Software*

Substances

  • Amino Acids
  • Proteins