RUBI: rapid proteomic-scale prediction of lysine ubiquitination and factors influencing predictor performance

Amino Acids. 2014 Apr;46(4):853-62. doi: 10.1007/s00726-013-1645-3. Epub 2013 Dec 23.

Abstract

Post-translational modification of protein lysines was recently shown to be a common feature of eukaryotic organisms. The ubiquitin modification is regarded as a versatile regulatory mechanism with many important cellular roles. Large-scale datasets are becoming available for H. sapiens ubiquitination. However, using current experimental techniques the vast majority of their sites remain unidentified and in silico tools may offer an alternative. Here, we introduce Rapid UBIquitination (RUBI) a sequence-based ubiquitination predictor designed for rapid application on a genome scale. RUBI was constructed using an iterative approach. At each iteration, important factors which influenced performance and its usability were investigated. The final RUBI model has an AUC of 0.868 on a large cross-validation set and is shown to outperform other available methods on independent sets. Predicted intrinsic disorder is shown to be weakly anti-correlated to ubiquitination for the H. sapiens dataset and improves performance slightly. RUBI predicts the number of ubiquitination sites correctly within three sites for ca. 80% of the tested proteins. The average potentially ubiquitinated proteome fraction is predicted to be at least 25% across a variety of model organisms, including several thousand possible H. sapiens proteins awaiting experimental characterization. RUBI can accurately predict ubiquitination on unseen examples and has a signal across different eukaryotic organisms. The factors which influenced the construction of RUBI could also be tested in other post-translational modification predictors. One of the more interesting factors is the influence of intrinsic protein disorder on ubiquitinated lysines where residues with low disorder probability are preferred.

Publication types

  • Evaluation Study

MeSH terms

  • Animals
  • Artificial Intelligence
  • Computational Biology / instrumentation
  • Computational Biology / methods*
  • Humans
  • Internet
  • Lysine / metabolism*
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Proteomics / instrumentation
  • Proteomics / methods*
  • Software
  • Ubiquitination

Substances

  • Proteins
  • Lysine