Use of a Structural Alphabet to Find Compatible Folds for Amino Acid Sequences

Protein Sci. 2015 Jan;24(1):145-53. doi: 10.1002/pro.2581. Epub 2014 Oct 25.


The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at

Keywords: fold recognition; protein blocks; protein domains; protein structures; sequence-structure relationship; structural alphabet; structural annotation; threading.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Conformation
  • Protein Folding*
  • Proteins / chemistry*
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*


  • Proteins

Associated data

  • PDB/2LTE
  • PDB/3GZL