Structural fragment clustering reveals novel structural and functional motifs in alpha-helical transmembrane proteins

BMC Bioinformatics. 2010 Apr 26:11:204. doi: 10.1186/1471-2105-11-204.

Abstract

Background: A large proportion of an organism's genome encodes for membrane proteins. Membrane proteins are important for many cellular processes, and several diseases can be linked to mutations in them. With the tremendous growth of sequence data, there is an increasing need to reliably identify membrane proteins from sequence, to functionally annotate them, and to correctly predict their topology.

Results: We introduce a technique called structural fragment clustering, which learns sequential motifs from 3D structural fragments. From over 500,000 fragments, we obtain 213 statistically significant, non-redundant, and novel motifs that are highly specific to alpha-helical transmembrane proteins. From these 213 motifs, 58 of them were assigned to function and checked in the scientific literature for a biological assessment. Seventy percent of the motifs are found in co-factor, ligand, and ion binding sites, 30% at protein interaction interfaces, and 12% bind specific lipids such as glycerol or cardiolipins. The vast majority of motifs (94%) appear across evolutionarily unrelated families, highlighting the modularity of functional design in membrane proteins. We describe three novel motifs in detail: (1) a dimer interface motif found in voltage-gated chloride channels, (2) a proton transfer motif found in heme-copper oxidases, and (3) a convergently evolved interface helix motif found in an aspartate symporter, a serine protease, and cytochrome b.

Conclusions: Our findings suggest that functional modules exist in membrane proteins, and that they occur in completely different evolutionary contexts and cover different binding sites. Structural fragment clustering allows us to link sequence motifs to function through clusters of structural fragments. The sequence motifs can be applied to identify and characterize membrane proteins in novel genomes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs
  • Binding Sites
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Protein
  • Membrane Proteins / chemistry*
  • Models, Molecular
  • Protein Folding
  • Protein Interaction Mapping / methods
  • Protein Structure, Tertiary

Substances

  • Membrane Proteins