Discrimination of outer membrane proteins using machine learning algorithms

Proteins. 2006 Jun 1;63(4):1031-7. doi: 10.1002/prot.20929.

Abstract

Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees, etc. for discriminating OMPs. We found that most of the machine learning techniques discriminate OMPs with similar accuracy. The neural network-based method could discriminate the OMPs from other proteins [globular/transmembrane helical (TMH)] at the fivefold cross-validation accuracy of 91.0% in a dataset of 1,088 proteins. The accuracy of discriminating globular proteins is 88.8% and that of TMH proteins is 93.7%. Further, the neural network method is tested with globular proteins belonging to 30 different folding types and it could successfully exclude 95% of the considered proteins. The proteins with SAM domain such as knottins, rubredoxin, and thioredoxin folds are eliminated with 100% accuracy. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.

MeSH terms

  • Algorithms*
  • Amino Acids / chemistry
  • Bacterial Outer Membrane Proteins / chemistry*
  • Bacterial Outer Membrane Proteins / classification
  • Bacterial Outer Membrane Proteins / metabolism*
  • Computers
  • Protein Folding
  • Reproducibility of Results
  • Sensitivity and Specificity

Substances

  • Amino Acids
  • Bacterial Outer Membrane Proteins