Positive-Unlabeled Learning for Pupylation Sites Prediction

Biomed Res Int. 2016:2016:4525786. doi: 10.1155/2016/4525786. Epub 2016 Aug 7.

Abstract

Pupylation plays a key role in regulating various protein functions as a crucial posttranslational modification of prokaryotes. In order to understand the molecular mechanism of pupylation, it is important to identify pupylation substrates and sites accurately. Several computational methods have been developed to identify pupylation sites because the traditional experimental methods are time-consuming and labor-sensitive. With the existing computational methods, the experimentally annotated pupylation sites are used as the positive training set and the remaining nonannotated lysine residues as the negative training set to build classifiers to predict new pupylation sites from the unknown proteins. However, the remaining nonannotated lysine residues may contain pupylation sites which have not been experimentally validated yet. Unlike previous methods, in this study, the experimentally annotated pupylation sites were used as the positive training set whereas the remaining nonannotated lysine residues were used as the unlabeled training set. A novel method named PUL-PUP was proposed to predict pupylation sites by using positive-unlabeled learning technique. Our experimental results indicated that PUL-PUP outperforms the other methods significantly for the prediction of pupylation sites. As an application, PUL-PUP was also used to predict the most likely pupylation sites in nonannotated lysine sites.

MeSH terms

  • Algorithms
  • Binding Sites
  • Databases, Protein
  • Lysine / chemistry
  • Machine Learning*
  • Protein Processing, Post-Translational*
  • Proteins / chemistry
  • Proteins / metabolism
  • Support Vector Machine
  • Ubiquitination

Substances

  • Proteins
  • Lysine