Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou's PseAAC

J Theor Biol. 2018 Oct 7:454:139-145. doi: 10.1016/j.jtbi.2018.05.033. Epub 2018 Jun 2.

Abstract

In this study, I introduce novel global and local 0D-protein descriptors based on a statistical quantity named Total Sum of Squares (TSS). This quantity represents the sum of the squares differences of amino acid properties from the arithmetic mean property. As an extension, the amino acid-types and amino acid-groups formalisms are used for describing zones of interest in proteins. To assess the effectiveness of the proposed descriptors, a Nearest Neighbor model for predicting the major four protein structural classes was built. This model has a success rate of 98.53% on the jackknife cross-validation test; this performance being superior to other reported methods despite the simplicity of the predictor. Additionally, this predictor has an average success rate of 98.35% in different cross-validation tests performed. A value of 0.98 for the Kappa statistic clearly discriminates this model from a random predictor. The results obtained by the Nearest Neighbor model demonstrated the ability of the proposed descriptors not only to reflect relevant biochemical information related to the structural classes of proteins but also to allow appropriate interpretability. It can thus be expected that the current method may play a supplementary role to other existing approaches for protein structural class prediction and other protein attributes.

Keywords: 0D-protein descriptor; Amino acid-group; Amino acid-type; Nearest Neighbor; Protein structural classes; Total sum of squares.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms*
  • Amino Acids / chemistry
  • Amino Acids / classification
  • Computational Biology / methods*
  • Databases, Protein
  • Internet
  • Models, Molecular
  • Models, Theoretical
  • Molecular Conformation
  • Proteins / chemistry*
  • Proteins / classification
  • Software
  • User-Computer Interface

Substances

  • Amino Acids
  • Proteins