Development of a sugar-binding residue prediction system from protein sequences using support vector machine

Comput Biol Chem. 2017 Feb:66:36-43. doi: 10.1016/j.compbiolchem.2016.10.009. Epub 2016 Nov 9.

Abstract

Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).

Keywords: Carbohydrate; Machine learning; Sugar-binding proteins; Sugar-binding residue prediction; Support vector machine.

MeSH terms

  • Binding Sites
  • Carbohydrates / chemistry*
  • Cluster Analysis
  • Proteins / metabolism*
  • Support Vector Machine*

Substances

  • Carbohydrates
  • Proteins