Knowledge-based prediction of protein backbone conformation using a structural alphabet

PLoS One. 2017 Nov 21;12(11):e0186215. doi: 10.1371/journal.pone.0186215. eCollection 2017.

Abstract

Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

MeSH terms

  • Algorithms
  • Amino Acid Sequence / genetics
  • Databases, Protein*
  • Protein Conformation*
  • Protein Folding
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteomics*
  • Sequence Analysis, Protein

Substances

  • Proteins

Grants and funding

This work was supported by the Région Réunion and the Fond Social Européen [grant no. 20131528] to IV. This work was in part supported by Conseil Régional des Pays de la Loire in the framework of GRIOTE project. AdB and FC acknowledge grants from the Ministry of Research (France), National Institute for Blood Transfusion (INTS, France), National Institute for Health and Medical Research (INSERM, France) and labex GR-Ex. The labex GR-Ex, reference ANR-11-LABX-0051 is funded by the program “Investissements d’avenir” of the French National Research Agency, reference ANR-11-IDEX-0005-02. AdB acknowledge supports by University Paris Diderot, Sorbonne, Paris Cité (France), FC acknowledge supports by Université de La Réunion, Faculty of Sciences and Technology. NS and AdB acknowledge to Indo-French Centre for the Promotion of Advanced Research / CEFIPRA for collaborative grant (number 5302-2). Research in NS laboratory is also supported by Department of Biotechnology, Government of India. NS is a J.C. Bose National Fellow.