Prediction of pupylation sites using the composition of k-spaced amino acid pairs

Chun-Wei Tung

doi:10.1016/j.jtbi.2013.07.009

Prediction of pupylation sites using the composition of k-spaced amino acid pairs

J Theor Biol. 2013 Nov 7:336:11-7. doi: 10.1016/j.jtbi.2013.07.009. Epub 2013 Jul 18.

Author

Chun-Wei Tung¹

Affiliation

¹ School of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan; PhD Program in Toxicology, Kaohsiung Medical University, Kaohsiung 807, Taiwan. Electronic address: cwtung@kmu.edu.tw.

PMID: 23871866
DOI: 10.1016/j.jtbi.2013.07.009

Abstract

Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.

Keywords: Feature selection; Pupylation; Software; Support vector machine; k-spaced amino acid pairs.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acids / metabolism*
Databases, Protein
Molecular Sequence Data
Protein Processing, Post-Translational*
Proteins / chemistry
Reproducibility of Results
Software
Ubiquitins / metabolism

Substances

Amino Acids
Proteins
Ubiquitins