Prediction of protein structural class based on symmetrical recurrence quantification analysis

Comput Biol Chem. 2021 Jun:92:107450. doi: 10.1016/j.compbiolchem.2021.107450. Epub 2021 Feb 8.

Abstract

Protein structural class prediction for low similarity sequences is a significant challenge and one of the deeply explored subjects. This plays an important role in drug design, folding recognition of protein, functional analysis and several other biology applications. In this paper, we worked with two benchmark databases existing in the literature (1) 25PDB and (2) 1189 to apply our proposed method for predicting protein structural class. Initially, we transformed protein sequences into DNA sequences and then into binary sequences. Furthermore, we applied symmetrical recurrence quantification analysis (the new approach), where we got 8 features from each symmetry plot computation. Moreover, the machine learning algorithms such as Linear Discriminant Analysis (LDA), Random Forest (RF) and Support Vector Machine (SVM) are used. In addition, comparison was made to find the best classifier for protein structural class prediction. Results show that symmetrical recurrence quantification as feature extraction method with RF classifier outperformed existing methods with an overall accuracy of 100% without overfitting.

Keywords: LDA; Machine learning; Protein structural classes; Random Forest; Recurrence plot; SVM; Symmetrical recurrence quantification analysis; Symmetry.

Publication types

  • Review

MeSH terms

  • Algorithms*
  • Computational Biology*
  • Databases, Protein
  • Humans
  • Protein Conformation
  • Proteins / chemistry*
  • Sequence Analysis, Protein*

Substances

  • Proteins