Support vector machines for prediction of protein signal sequences and their cleavage sites

Yu-Dong Cai; Shuo-liang Lin; Kuo-Chen Chou

doi:10.1016/s0196-9781(02)00289-9

Support vector machines for prediction of protein signal sequences and their cleavage sites

Peptides. 2003 Jan;24(1):159-61. doi: 10.1016/s0196-9781(02)00289-9.

Authors

Yu-Dong Cai¹, Shuo-liang Lin, Kuo-Chen Chou

Affiliation

¹ Shanghai Research Centre of Biotechnology, Chinese Academy of Sciences, Shanghai 200233, China. y.cai@umist.ac.uk

PMID: 12576098
DOI: 10.1016/s0196-9781(02)00289-9

Abstract

Given a nascent protein sequence, how can one predict its signal peptide or "Zipcode" sequence? This is an important problem for scientists to use signal peptides as a vehicle to find new drugs or to reprogram cells for gene therapy (see, e.g. K.C. Chou, Current Protein and Peptide Science 2002;3:615-22). In this paper, support vector machines (SVMs), a new machine learning method, is applied to approach this problem. The overall rate of correct prediction for 1939 secretary proteins and 1440 nonsecretary proteins was over 91%. It has not escaped our attention that the new method may also serve as a useful tool for further investigating many unclear details regarding the molecular mechanism of the ZIP code protein-sorting system in cells.

MeSH terms

Genetic Vectors*
Hydrolysis
Protein Sorting Signals*
Proteins / chemistry
Proteins / genetics
Proteins / metabolism*

Substances

Protein Sorting Signals
Proteins