PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.

Abstract

The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method---PairProSVM---to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. This paper also demonstrates that the performance of PairProSVM is sensitive (and somewhat proportional) to the degree of its kernel matrix meeting the Mercer's condition. PairProSVM was evaluated on Reinhardt and Hubbard's, Huang and Li's, and Gardy et al.'s protein datasets. The overall accuracies on these three datasets reach 99.3\\%, 76.5\\%, and 91.9\\%, respectively, which are higher than or comparable to those obtained by sequence alignment and by the methods compared in this paper.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Artificial Intelligence
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods*
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Software*
  • Structure-Activity Relationship
  • Subcellular Fractions / metabolism*
  • Tissue Distribution

Substances

  • Proteins