Discrimination of outer membrane proteins using machine learning algorithms

M Michael Gromiha; Makiko Suwa

doi:10.1002/prot.20929

Discrimination of outer membrane proteins using machine learning algorithms

Proteins. 2006 Jun 1;63(4):1031-7. doi: 10.1002/prot.20929.

Authors

M Michael Gromiha¹, Makiko Suwa

Affiliation

¹ Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan. michael-gromiha@aist.go.jp

PMID: 16493651
DOI: 10.1002/prot.20929

Abstract

Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees, etc. for discriminating OMPs. We found that most of the machine learning techniques discriminate OMPs with similar accuracy. The neural network-based method could discriminate the OMPs from other proteins [globular/transmembrane helical (TMH)] at the fivefold cross-validation accuracy of 91.0% in a dataset of 1,088 proteins. The accuracy of discriminating globular proteins is 88.8% and that of TMH proteins is 93.7%. Further, the neural network method is tested with globular proteins belonging to 30 different folding types and it could successfully exclude 95% of the considered proteins. The proteins with SAM domain such as knottins, rubredoxin, and thioredoxin folds are eliminated with 100% accuracy. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.

2006 Wiley-Liss, Inc.

MeSH terms

Algorithms*
Amino Acids / chemistry
Bacterial Outer Membrane Proteins / chemistry*
Bacterial Outer Membrane Proteins / classification
Bacterial Outer Membrane Proteins / metabolism*
Computers
Protein Folding
Reproducibility of Results
Sensitivity and Specificity

Substances

Amino Acids
Bacterial Outer Membrane Proteins