Computational identification of MoRFs in protein sequences

Bioinformatics. 2015 Jun 1;31(11):1738-44. doi: 10.1093/bioinformatics/btv060. Epub 2015 Jan 30.

Abstract

Motivation: Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is the binding of molecular recognition features (MoRFs) to globular protein domains in a process known as a disorder-to-order transition. Predicting the location of MoRFs in protein sequences with high accuracy remains an important computational challenge.

Method: In this study, we introduce MoRFCHiBi, a new computational approach for fast and accurate prediction of MoRFs in protein sequences. MoRFCHiBi combines the outcomes of two support vector machine (SVM) models that take advantage of two different kernels with high noise tolerance. The first, SVMS, is designed to extract maximal information from the general contrast in amino acid compositions between MoRFs, their surrounding regions (Flanks), and the remainders of the sequences. The second, SVMT, is used to identify similarities between regions in a query sequence and MoRFs of the training set.

Results: We evaluated the performance of our predictor by comparing its results with those of two currently available MoRF predictors, MoRFpred and ANCHOR. Using three test sets that have previously been collected and used to evaluate MoRFpred and ANCHOR, we demonstrate that MoRFCHiBi outperforms the other predictors with respect to different evaluation metrics. In addition, MoRFCHiBi is downloadable and fast, which makes it useful as a component in other computational prediction tools.

Availability and implementation: http://www.chibi.ubc.ca/morf/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acids
  • Computational Biology / methods
  • Intrinsically Disordered Proteins / chemistry*
  • Protein Structure, Tertiary
  • Sequence Analysis, Protein / methods*
  • Software*
  • Support Vector Machine

Substances

  • Amino Acids
  • Intrinsically Disordered Proteins