Positive-Unlabeled Learning for Pupylation Sites Prediction

Ming Jiang; Jun-Zhe Cao

doi:10.1155/2016/4525786

Positive-Unlabeled Learning for Pupylation Sites Prediction

Biomed Res Int. 2016:2016:4525786. doi: 10.1155/2016/4525786. Epub 2016 Aug 7.

Authors

Ming Jiang¹, Jun-Zhe Cao²

Affiliations

¹ School of Electronic Engineering, Dongguan University of Technology, Dongguan 523808, China.
² School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.

Abstract

Pupylation plays a key role in regulating various protein functions as a crucial posttranslational modification of prokaryotes. In order to understand the molecular mechanism of pupylation, it is important to identify pupylation substrates and sites accurately. Several computational methods have been developed to identify pupylation sites because the traditional experimental methods are time-consuming and labor-sensitive. With the existing computational methods, the experimentally annotated pupylation sites are used as the positive training set and the remaining nonannotated lysine residues as the negative training set to build classifiers to predict new pupylation sites from the unknown proteins. However, the remaining nonannotated lysine residues may contain pupylation sites which have not been experimentally validated yet. Unlike previous methods, in this study, the experimentally annotated pupylation sites were used as the positive training set whereas the remaining nonannotated lysine residues were used as the unlabeled training set. A novel method named PUL-PUP was proposed to predict pupylation sites by using positive-unlabeled learning technique. Our experimental results indicated that PUL-PUP outperforms the other methods significantly for the prediction of pupylation sites. As an application, PUL-PUP was also used to predict the most likely pupylation sites in nonannotated lysine sites.

MeSH terms

Algorithms
Binding Sites
Databases, Protein
Lysine / chemistry
Machine Learning*
Protein Processing, Post-Translational*
Proteins / chemistry
Proteins / metabolism
Support Vector Machine
Ubiquitination

Substances

Proteins
Lysine