Predicting protein subcellular localization by pseudo amino acid composition with a segment-weighted and features-combined approach

Protein Pept Lett. 2011 May;18(5):480-7. doi: 10.2174/092986611794927947.

Abstract

Information of protein subcellular location plays an important role in molecular cell biology. Prediction of the subcellular location of proteins will help to understand their functions and interactions. In this paper, a different mode of pseudo amino acid composition was proposed to represent protein samples for predicting their subcellular localization via the following procedures: based on the optimal splice site of each protein sequence, we divided a sequence into sorting signal part and mature protein part, and extracted sequence features from each part separately. Then, the combined features were fed into the SVM classifier to perform the prediction. By the jackknife test on a benchmark dataset in which none of proteins included has more than 90% pairwise sequence identity to any other, the overall accuracies achieved by the method are 94.5% and 90.3% for prokaryotic and eukaryotic proteins, respectively. The results indicate that the prediction quality by our method is quite satisfactory. It is anticipated that the current method may serve as an alternative approach to the existing prediction methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acids / analysis
  • Amino Acids / classification
  • Eukaryotic Cells / chemistry*
  • Eukaryotic Cells / cytology
  • Prokaryotic Cells / chemistry*
  • Prokaryotic Cells / cytology
  • Protein Sorting Signals
  • Protein Splicing
  • Reproducibility of Results
  • Sequence Analysis, Protein / methods*
  • Stereoisomerism

Substances

  • Amino Acids
  • Protein Sorting Signals