Prediction of pupylation sites using the composition of k-spaced amino acid pairs

J Theor Biol. 2013 Nov 7:336:11-7. doi: 10.1016/j.jtbi.2013.07.009. Epub 2013 Jul 18.


Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at

Keywords: Feature selection; Pupylation; Software; Support vector machine; k-spaced amino acid pairs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Amino Acids / metabolism*
  • Databases, Protein
  • Molecular Sequence Data
  • Protein Processing, Post-Translational*
  • Proteins / chemistry
  • Reproducibility of Results
  • Software
  • Ubiquitins / metabolism


  • Amino Acids
  • Proteins
  • Ubiquitins