Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA

BMC Bioinformatics. 2020 May 24;21(1):212. doi: 10.1186/s12859-020-3539-1.

Abstract

Background: Apoptosis, also called programmed cell death, refers to the spontaneous and orderly death of cells controlled by genes in order to maintain a stable internal environment. Identifying the subcellular location of apoptosis proteins is very helpful in understanding the mechanism of apoptosis and designing drugs. Therefore, the subcellular localization of apoptosis proteins has attracted increased attention in computational biology. Effective feature extraction methods play a critical role in predicting the subcellular location of proteins.

Results: In this paper, we proposed two novel feature extraction methods based on evolutionary information. One of the features obtained the evolutionary information via the transition matrix of the consensus sequence (CTM). And the other utilized the evolutionary information from PSSM based on absolute entropy correlation analysis (AECA-PSSM). After fusing the two kinds of features, linear discriminant analysis (LDA) was used to reduce the dimension of the proposed features. Finally, the support vector machine (SVM) was adopted to predict the protein subcellular locations. The proposed CTM-AECA-PSSM-LDA subcellular location prediction method was evaluated using the CL317 dataset and ZW225 dataset. By jackknife test, the overall accuracy was 99.7% (CL317) and 95.6% (ZW225) respectively.

Conclusions: The experimental results show that the proposed method which is hopefully to be a complementary tool for the existing methods of subcellular localization, can effectively extract more abundant features of protein sequence and is feasible in predicting the subcellular location of apoptosis proteins.

Keywords: Absolute entropy correlation analysis; Consensus sequence; Linear discriminant analysis; Position-specific scoring matrix; Subcellular location.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Apoptosis Regulatory Proteins / chemistry
  • Apoptosis Regulatory Proteins / metabolism*
  • Consensus Sequence
  • Databases, Protein
  • Discriminant Analysis*
  • Entropy
  • Evolution, Molecular*
  • Position-Specific Scoring Matrices
  • ROC Curve
  • Subcellular Fractions / metabolism
  • Support Vector Machine

Substances

  • Apoptosis Regulatory Proteins