Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights

Bioinformatics. 2013 Mar 15;29(6):686-94. doi: 10.1093/bioinformatics/btt031. Epub 2013 Jan 22.

Abstract

Motivation: Phosphorylation is the most important post-translational modification in eukaryotes. Although many computational phosphorylation site prediction tools exist for mammals, and a few were created specifically for Arabidopsis thaliana, none are currently available for other plants.

Results: In this article, we propose a novel random forest-based method called PHOSFER (PHOsphorylation Site FindER) for applying phosphorylation data from other organisms to enhance the accuracy of predictions in a target organism. As a test case, PHOSFER is applied to phosphorylation sites in soybean, and we show that it more accurately predicts soybean sites than both the existing Arabidopsis-specific predictors, and a simpler machine-learning scheme that uses only known phosphorylation sites and non-phosphorylation sites from soybean. In addition to soybean, PHOSFER will be extended to other organisms in the near future.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Artificial Intelligence*
  • Cattle
  • Computational Biology / methods
  • Glycine max / metabolism
  • Humans
  • Mice
  • Phosphorylation
  • Plant Proteins / chemistry
  • Plant Proteins / metabolism*
  • Protein Processing, Post-Translational
  • Sequence Alignment
  • Sequence Analysis, Protein
  • Software*

Substances

  • Plant Proteins