A new representation for protein secondary structure prediction based on frequent patterns

Fabian Birzele; Stefan Kramer

doi:10.1093/bioinformatics/btl453

A new representation for protein secondary structure prediction based on frequent patterns

Bioinformatics. 2006 Nov 1;22(21):2628-34. doi: 10.1093/bioinformatics/btl453. Epub 2006 Aug 29.

Authors

Fabian Birzele¹, Stefan Kramer

Affiliation

¹ Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University Amalienstrasse 17, D-80333 München, Germany.

PMID: 16940325
DOI: 10.1093/bioinformatics/btl453

Abstract

Motivation: A new representation for protein secondary structure prediction based on frequent amino acid patterns is described and evaluated. We discuss in detail how to identify frequent patterns in a protein sequence database using a level-wise search technique, how to define a set of features from those patterns and how to use those features in the prediction of the secondary structure of a protein sequence using support vector machines (SVMs).

Results: Three different sets of features based on frequent patterns are evaluated in a blind testing setup using 150 targets from the EVA contest and compared to predictions of PSI-PRED, PHD and PROFsec. Despite being trained on only 940 proteins, a simple SVM classifier based on this new representation yields results comparable to PSI-PRED and PROFsec. Finally, we show that the method contributes significant information to consensus predictions.

Availability: The method is available from the authors upon request.

Publication types

Evaluation Study

MeSH terms

Algorithms*
Amino Acid Sequence
Computer Simulation
Models, Chemical*
Models, Molecular*
Molecular Sequence Data
Protein Structure, Secondary*
Proteins / chemistry*
Proteins / ultrastructure
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*

Substances

Proteins