Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix

Oncotarget. 2016 Dec 13;7(50):82440-82449. doi: 10.18632/oncotarget.12517.

Abstract

Self-interacting Proteins (SIPs) play an essential role in a wide range of biological processes, such as gene expression regulation, signal transduction, enzyme activation and immune response. Because of the limitations for experimental self-interaction proteins identification, developing an effective computational method based on protein sequence to detect SIPs is much important. In the study, we proposed a novel computational approach called RVMBIGP that combines the Relevance Vector Machine (RVM) model and Bi-gram probability (BIGP) to predict SIPs based on protein sequence. The proposed prediction model includes as following steps: (1) an effective feature extraction method named BIGP is used to represent protein sequences on Position Specific Scoring Matrix (PSSM); (2) Principal Component Analysis (PCA) method is employed for integrating the useful information and reducing the influence of noise; (3) the robust classifier Relevance Vector Machine (RVM) is used to carry out classification. When performed on yeast and human datasets, the proposed RVMBIGP model can achieve very high accuracies of 95.48% and 98.80%, respectively. The experimental results show that our proposed method is very promising and may provide a cost-effective alternative for SIPs identification. In addition, to facilitate extensive studies for future proteomics research, the RVMBIGP server is freely available for academic use at http://219.219.62.123:8888/RVMBIGP.

Keywords: cancer; disease; position-specific scoring matrix; protein self-interaction.

MeSH terms

  • Computational Biology / methods*
  • Databases, Protein
  • Fungal Proteins / chemistry*
  • Fungal Proteins / classification
  • Humans
  • Position-Specific Scoring Matrices*
  • Principal Component Analysis
  • Protein Interaction Mapping / methods*
  • Sequence Analysis, Protein
  • Support Vector Machine*

Substances

  • Fungal Proteins