A quadratic discriminant analysis of protein structure classification based on the Helix/Strand content

J Theor Biol. 1999 Dec 7;201(3):189-99. doi: 10.1006/jtbi.1999.1023.


Based on the 210 non-homologous proteins (domains) classified manually by Michie et al. (J. Mol. Biol. 262, 168-185, 1996), a new structure classification criterion of globular proteins relying on the content of helix/strand has been proposed, using a quadratic discriminant method. Each protein is classified into one of the three classes, i.e. those of alpha class, beta class and alphabeta class (including alpha/beta and alpha+beta classes). According to the new structure classification criterion, of the 210 proteins in the training set, 207 are correctly classified and thus the accuracy is 207/210=98.57%. Multiple cross-validation tests are performed. The jackknife test shows that of the 210 proteins 207 are correctly classified with an accuracy of 98.57%. To test the method further, of 3577 proteins (domains) extracted from SCOP, 91.39% of them are correctly reclassified by the new classification criterion. On average, the accuracy of the new criterion is about 8 percentage points higher than that of the criterion proposed by Nakashima et al. (J. Biochem. 99, 153-162, 1986). Our result shows that the classification based solely on structures is basically consistent with that combining both structural and evolutionary information. Further complete automated classification scheme should consider both structures and evolutionary relationship. The methodology presented provides an appropriate mathematical format to reach this goal.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Classification / methods
  • Discriminant Analysis
  • Evolution, Molecular
  • Humans
  • Protein Structure, Secondary*
  • Reproducibility of Results