Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

BMC Bioinformatics. 2011 Jun 20:12:247. doi: 10.1186/1471-2105-12-247.

Abstract

Background: One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function.

Results: Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM.

Conclusions: Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

MeSH terms

  • Amino Acid Motifs*
  • Databases, Protein
  • Ligands
  • Markov Chains
  • Models, Molecular
  • Molecular Sequence Annotation / methods*
  • Protein Binding
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism

Substances

  • Ligands
  • Proteins