Support-vector-machine classification of linear functional motifs in proteins

J Mol Model. 2006 Mar;12(4):453-61. doi: 10.1007/s00894-005-0070-2. Epub 2005 Dec 10.

Abstract

Our algorithm predicts short linear functional motifs in proteins using only sequence information. Statistical models for short linear functional motifs in proteins are built using the database of short sequence fragments taken from proteins in the current release of the Swiss-Prot database. Those segments are confirmed by experiments to have single-residue post-translational modification. The sensitivities of the classification for various types of short linear motifs are in the range of 70%. The query protein sequence is dissected into short overlapping fragments. All segments are represented as vectors. Each vector is then classified by a machine learning algorithm (Support Vector Machine) as potentially modifiable or not. The resulting list of plausible post-translational sites in the query protein is returned to the user. We also present a study of the human protein kinase C family as a biological application of our method.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Genetic
  • Humans
  • Models, Biological
  • Phosphorylation
  • Proteins / chemistry
  • Proteins / classification*
  • Proteins / metabolism*

Substances

  • Proteins