Detecting protein candidate fragments using a structural alphabet profile comparison approach

PLoS One. 2013 Nov 26;8(11):e80493. doi: 10.1371/journal.pone.0080493. eCollection 2013.


Predicting accurate fragments from sequence has recently become a critical step for protein structure modeling, as protein fragment assembly techniques are presently among the most efficient approaches for de novo prediction. A key step in these approaches is, given the sequence of a protein to model, the identification of relevant fragments - candidate fragments - from a collection of the available 3D structures. These fragments can then be assembled to produce a model of the complete structure of the protein of interest. The search for candidate fragments is classically achieved by considering local sequence similarity using profile comparison, or threading approaches. In the present study, we introduce a new profile comparison approach that, instead of using amino acid profiles, is based on the use of predicted structural alphabet profiles, where structural alphabet profiles contain information related to the 3D local shapes associated with the sequences. We show that structural alphabet profile-profile comparison can be used efficiently to retrieve accurate structural fragments, and we introduce a fully new protocol for the detection of candidate fragments. It identifies fragments specific of each position of the sequence and of size varying between 6 and 27 amino-acids. We find it outperforms present state of the art approaches in terms (i) of the accuracy of the fragments identified, (ii) the rate of true positives identified, while having a high coverage score. We illustrate the relevance of the approach on complete target sets of the two previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) rounds 9 and 10. A web server for the approach is freely available at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Protein
  • Models, Molecular*
  • Peptide Fragments / chemistry*
  • Protein Conformation
  • Proteins / chemistry*


  • Peptide Fragments
  • Proteins

Grants and funding

UMR-S 973 INSERM recurrent funding and IA Bioinformatique BipBip grant. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.