SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites

Mol Biosyst. 2016 Aug 16;12(9):2849-58. doi: 10.1039/c6mb00314a.


Protein S-sulfenylation (SOH) is a type of post-translational modification through the oxidation of cysteine thiols to sulfenic acids. It acts as a redox switch to modulate versatile cellular processes and plays important roles in signal transduction, protein folding and enzymatic catalysis. Reversible SOH is also a key component for maintaining redox homeostasis and has been implicated in a variety of human diseases, such as cancer, diabetes, and atherosclerosis, due to redox imbalance. Despite its significance, the in situ trapping of the entire 'sulfenome' remains a major challenge. Yang et al. have recently experimentally identified about 1000 SOH sites, providing an enriched benchmark SOH dataset. In this work, we developed a new ensemble learning tool SOHPRED for identifying protein SOH sites based on the compositions of enriched amino acids and the physicochemical properties of residues surrounding SOH sites. SOHPRED was built based on four complementary predictors, i.e. a naive Bayesian predictor, a random forest predictor and two support vector machine predictors, whose training features are, respectively, amino acid occurrences, physicochemical properties, frequencies of k-spaced amino acid pairs and sequence profiles. Benchmarking experiments on the 5-fold cross validation and independent tests show that SOHPRED achieved AUC values of 0.784 and 0.799, respectively, which outperforms several previously developed tools. As a real application of SOHPRED, we predicted potential SOH sites for 193 S-sulfenylated substrates, which had been experimentally detected through a global sulfenome profiling in living cells, though the actual SOH sites were not determined. The web server of SOHPRED has been made publicly available at for the wider research community. The source codes and the benchmark datasets can be downloaded from the website.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Bayes Theorem
  • Catalysis
  • Computational Biology / methods*
  • Cysteine / chemistry
  • Cysteine / metabolism*
  • Datasets as Topic
  • Humans
  • Oxidation-Reduction
  • Peptides / chemistry
  • Peptides / metabolism
  • Position-Specific Scoring Matrices
  • Protein Folding
  • Protein Processing, Post-Translational*
  • ROC Curve
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sulfenic Acids / chemistry
  • Sulfhydryl Compounds / chemistry
  • Support Vector Machine
  • Web Browser


  • Peptides
  • Sulfenic Acids
  • Sulfhydryl Compounds
  • Cysteine