Motivation: Systematic dissection of the ubiquitylation proteome is emerging as an appealing but challenging research topic because of the significant roles ubiquitylation play not only in protein degradation but also in many other cellular functions. High-throughput experimental studies using mass spectrometry have identified many ubiquitylation sites, primarily from eukaryotes. However, the vast majority of ubiquitylation sites remain undiscovered, even in well-studied systems. Because mass spectrometry-based experimental approaches for identifying ubiquitylation events are costly, time-consuming and biased toward abundant proteins and proteotypic peptides, in silico prediction of ubiquitylation sites is a potentially useful alternative strategy for whole proteome annotation. Because of various limitations, current ubiquitylation site prediction tools were not well designed to comprehensively assess proteomes.
Results: We present a novel tool known as UbiProber, specifically designed for large-scale predictions of both general and species-specific ubiquitylation sites. We collected proteomics data for ubiquitylation from multiple species from several reliable sources and used them to train prediction models by a comprehensive machine-learning approach that integrates the information from key positions and key amino acid residues. Cross-validation tests reveal that UbiProber achieves some improvement over existing tools in predicting species-specific ubiquitylation sites. Moreover, independent tests show that UbiProber improves the areas under receiver operating characteristic curves by ~15% by using the Combined model.
Availability: The UbiProber server is freely available on the web at http://bioinfo.ncu.edu.cn/UbiProber.aspx. The software system of UbiProber can be downloaded at the same site.
Supplementary information: Supplementary data are available at Bioinformatics online.