Accurate single-sequence prediction of solvent accessible surface area using local and global features

Proteins. 2014 Nov;82(11):3170-6. doi: 10.1002/prot.24682. Epub 2014 Sep 25.

Abstract

We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org.

Keywords: ASA prediction; accessible surface area; automatic learning; protein.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Databases, Protein
  • Neural Networks, Computer*
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / metabolism
  • Solvents / chemistry

Substances

  • Proteins
  • Solvents