RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information

Proteomics. 2009 May;9(9):2324-42. doi: 10.1002/pmic.200700597.

Abstract

The attainment of complete map-based sequence for rice (Oryza sativa) is clearly a major milestone for the research community. Identifying the localization of encoded proteins is the key to understanding their functional characteristics and facilitating their purification. Our proposed method, RSLpred, is an effort in this direction for genome-scale subcellular prediction of encoded rice proteins. First, the support vector machine (SVM)-based modules have been developed using traditional amino acid-, dipeptide- (i+1) and four parts-amino acid composition and achieved an overall accuracy of 81.43, 80.88 and 81.10%, respectively. Secondly, a similarity search-based module has been developed using position-specific iterated-basic local alignment search tool and achieved 68.35% accuracy. Another module developed using evolutionary information of a protein sequence extracted from position-specific scoring matrix achieved an accuracy of 87.10%. In this study, a large number of modules have been developed using various encoding schemes like higher-order dipeptide composition, N- and C-terminal, splitted amino acid composition and the hybrid information. In order to benchmark RSLpred, it was tested on an independent set of rice proteins where it outperformed widely used prediction methods such as TargetP, Wolf-PSORT, PA-SUB, Plant-Ploc and ESLpred. To assist the plant research community, an online web tool 'RSLpred' has been developed for subcellular prediction of query rice proteins, which is freely accessible at http://www.imtech.res.in/raghava/rslpred.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Artificial Intelligence
  • Computer Simulation
  • Cytoplasm / chemistry
  • Databases, Protein
  • Evolution, Molecular
  • Internet
  • Organelles / chemistry
  • Oryza / chemistry*
  • Oryza / genetics*
  • Plant Proteins / analysis*
  • Plant Proteins / chemistry
  • Plant Proteins / genetics
  • Proteomics / methods*
  • Reproducibility of Results
  • Sequence Analysis, Protein
  • User-Computer Interface

Substances

  • Plant Proteins