On the encoding of proteins for disordered regions prediction

PLoS One. 2013 Dec 16;8(12):e82252. doi: 10.1371/journal.pone.0082252. eCollection 2013.

Abstract

Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Models, Molecular*
  • Protein Conformation*
  • Sequence Analysis, Protein / methods*
  • Software

Grants and funding

This work is supported by a FRIA (Fund for Research Training in Industry and Agriculture) fellowship (Julien Becker) granted by the Belgian National Fund of Scientific Research (FNRS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.