Prediction of protein phosphorylation sites by using the composition of k-spaced amino acid pairs

PLoS One. 2012;7(10):e46302. doi: 10.1371/journal.pone.0046302. Epub 2012 Oct 22.


As one of the most widespread protein post-translational modifications, phosphorylation is involved in many biological processes such as cell cycle, apoptosis. Identification of phosphorylated substrates and their corresponding sites will facilitate the understanding of the molecular mechanism of phosphorylation. Comparing with the labor-intensive and time-consuming experiment approaches, computational prediction of phosphorylation sites is much desirable due to their convenience and fast speed. In this paper, a new bioinformatics tool named CKSAAP_PhSite was developed that ignored the kinase information and only used the primary sequence information to predict protein phosphorylation sites. The highlight of CKSAAP_PhSite was to utilize the composition of k-spaced amino acid pairs as the encoding scheme, and then the support vector machine was used as the predictor. The performance of CKSAAP_PhSite was measured with a sensitivity of 84.81%, a specificity of 86.07% and an accuracy of 85.43% for serine, a sensitivity of 78.59%, a specificity of 82.26% and an accuracy of 80.31% for threonine as well as a sensitivity of 74.44%, a specificity of 78.03% and an accuracy of 76.21% for tyrosine. Experimental results obtained from cross validation and independent benchmark suggested that our method was very promising to predict phosphorylation sites and can be served as a useful supplement tool to the community. For public access, CKSAAP_PhSite is available at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Binding Sites
  • Computational Biology / methods*
  • Phosphorylation
  • Proteins / chemistry*


  • Amino Acids
  • Proteins

Grant support

This research is partially supported by the National Natural Science Foundation of China under Grant Nos. 60803102, 61070084, and also funded by the Natural Science Foundation of Jilin Province (Nos. 20101506 and 20110104). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.