A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins

Int J Biochem Cell Biol. 2002 Mar;34(3):298-307. doi: 10.1016/s1357-2725(01)00121-2.


Zp curve, a three-dimensional space curve representation of protein primary sequence based on the hydrophobicity and charged properties of amino acid residues along the primary sequence is suggested. Relying on the Zp parameters extracted from the three components of the Zp curve and the Bayes discriminant algorithm, the subcellular locations of prokaryotic proteins were predicted. Consequently, an accuracy of 81.5% in the cross-validation test has been achieved using 13 parameters extracted from the curve for the database of 997 prokaryotic proteins. The result is slightly better than that of using the neural network method (80.9%) based on the amino acid composition for the same database. By jointing the amino acid composition and the Zp parameters, the overall predictive accuracy 89.6% can be achieved. It is about 3% higher than that of the Bayes discriminant algorithm based merely on the amino acid composition for the same database. The prediction is also performed with a larger dataset derived from the version 39 SWISS-PROT databank and two datasets with different sequence similarity. Even for the dataset of non-sequence similarity, the improvement can be of 4.4% in the cross-validation test. The results indicate that the Zp parameters are effective in representing the information within a protein primary sequence. The method of extracting information from the primary structure may be useful for other areas of protein studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence*
  • Bayes Theorem
  • Prokaryotic Cells* / metabolism
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism*


  • Proteins