Exploiting genomic patterns to discover new supramolecular protein assemblies

Protein Sci. 2009 Jan;18(1):69-79. doi: 10.1002/pro.1.

Abstract

Bacterial microcompartments are supramolecular protein assemblies that function as bacterial organelles by compartmentalizing particular enzymes and metabolic intermediates. The outer shells of these microcompartments are assembled from multiple paralogous structural proteins. Because the paralogs are required to assemble together, their genes are often transcribed together from the same operon, giving rise to a distinctive genomic pattern: multiple, typically small, paralogous proteins encoded in close proximity on the bacterial chromosome. To investigate the generality of this pattern in supramolecular assemblies, we employed a comparative genomics approach to search for protein families that show the same kind of genomic pattern as that exhibited by bacterial microcompartments. The results indicate that a variety of large supramolecular assemblies fit the pattern, including bacterial gas vesicles, bacterial pili, and small heat-shock protein complexes. The search also retrieved several widely distributed protein families of presently unknown function. The proteins from one of these families were characterized experimentally and found to show a behavior indicative of supramolecular assembly. We conclude that cotranscribed paralogs are a common feature of diverse supramolecular assemblies, and a useful genomic signature for discovering new kinds of large protein assemblies from genomic data.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacteria / chemistry
  • Bacteria / ultrastructure*
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics*
  • Bacterial Proteins / metabolism
  • Cell Compartmentation / physiology
  • Comparative Genomic Hybridization
  • Computational Biology
  • Databases, Genetic
  • Genome, Bacterial
  • Genomics / methods*
  • Multiprotein Complexes / chemistry
  • Multiprotein Complexes / genetics*
  • Multiprotein Complexes / metabolism

Substances

  • Bacterial Proteins
  • Multiprotein Complexes