AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model

Biomed Res Int. 2015:2015:678764. doi: 10.1155/2015/678764. Epub 2015 Aug 3.

Abstract

Motivation: The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction.

Method: We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels.

Results: Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Binding Sites
  • Databases, Protein
  • Models, Molecular
  • Neural Networks, Computer
  • Protein Conformation
  • Protein Folding*
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteins / metabolism
  • Solvents / chemistry*

Substances

  • Proteins
  • Solvents