A new representation for protein secondary structure prediction based on frequent patterns

Bioinformatics. 2006 Nov 1;22(21):2628-34. doi: 10.1093/bioinformatics/btl453. Epub 2006 Aug 29.

Abstract

Motivation: A new representation for protein secondary structure prediction based on frequent amino acid patterns is described and evaluated. We discuss in detail how to identify frequent patterns in a protein sequence database using a level-wise search technique, how to define a set of features from those patterns and how to use those features in the prediction of the secondary structure of a protein sequence using support vector machines (SVMs).

Results: Three different sets of features based on frequent patterns are evaluated in a blind testing setup using 150 targets from the EVA contest and compared to predictions of PSI-PRED, PHD and PROFsec. Despite being trained on only 940 proteins, a simple SVM classifier based on this new representation yields results comparable to PSI-PRED and PROFsec. Finally, we show that the method contributes significant information to consensus predictions.

Availability: The method is available from the authors upon request.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computer Simulation
  • Models, Chemical*
  • Models, Molecular*
  • Molecular Sequence Data
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Proteins / ultrastructure
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins