Sibe: a computation tool to apply protein sequence statistics to predict folding and design in silico

BMC Bioinformatics. 2019 Sep 6;20(1):455. doi: 10.1186/s12859-019-2984-1.

Abstract

Background: Evolutionary information contained in the amino acid sequences of proteins specifies the biological function and fold, but exactly what information contained in the protein sequence drives both of these processes? Considerable progress has been made to answer this fundamental question, but it remains challenging to explore the potential space of cooperative interactions between amino acids. Statistical analysis plays a significant role in studying such interactions and its use has expanded in recent years to studies ranging from coevolution-guided rational protein design to protein folding in silico.

Results: Here we describe a computational tool named Sibe for use in studies of protein sequence, folding, and design using evolutionary coupling between amino acids as a driving factor. In this study, Sibe is used to identify positionally conserved couplings between pairwise amino acids and aid rational protein design. In this process, pairwise couplings are filtered according to the relative entropy computed from the positional conservations and grouped into several 'blocks', which could contribute to driving protein folding and design. A human β2-adrenergic receptor (β2AR) was used to demonstrate that those 'blocks' contribute the rational design for specifying functional residues. Sibe also provides folding modules based on both the positionally conserved couplings and well-established statistical potentials for simulating protein folding in silico and predicting tertiary structure. Our results show that statistically inferences of basic evolutionary principles, such as conservations and coupled-mutations, can be used to rapidly design a diverse set of proteins and study protein folding.

Conclusions: The developed software Sibe provides a computational tool for systematical analysis from protein primary to its tertiary structure using the evolutionary couplings as a driving factor. Sibe, written in C++, accounts for compatibility with the 'big data' era in biological science, and it primarily focuses on protein sequence analysis, but it is also applicable to extend to other modeling and predictions of experimental measurements.

Keywords: Computational protein design; Evolutionary coupling analysis; Protein folding; Protein structure prediction.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods*
  • Computer Simulation*
  • Entropy
  • Humans
  • Mutation
  • Protein Engineering*
  • Protein Folding*
  • Proteins / chemistry*
  • Proteins / genetics*
  • Receptors, Adrenergic, beta-2 / chemistry
  • Receptors, Adrenergic, beta-2 / genetics
  • Sequence Analysis
  • Software

Substances

  • Proteins
  • Receptors, Adrenergic, beta-2