A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure

J Mol Biol. 1997 Dec 12;274(4):562-76. doi: 10.1006/jmbi.1997.1412.

Abstract

Representative genomes from each of the three kingdoms of life are compared in terms of protein structure, in particular, those of Haemophilus influenzae (a bacteria), Methanococcus jannaschii (an archaeon), and yeast (a eukaryote). The comparison is in the form of a census (or comprehensive accounting) of the relative occurrence of secondary and tertiary structures in the genomes, which particular emphasis on patterns of supersecondary structure. Comparison of secondary structure shows that the three genomes have nearly the same overall secondary-structure content, although they differ markedly in amino acid composition. Comparison of super-secondary structure, using a novel "frequent-words" approach, shows that yeast has a preponderance of consecutive strands (e.g. beta-beta-beta patterns), Haemophilus, consecutive helices (alpha-alpha-alpha), and Methanococcus, alternating helix-strand structures (beta-alpha-beta). Yeast also has significantly more helical membrane proteins than the other two genomes, with most of the differences concentrated in proteins containing two transmembrane segments. Comparison of tertiary structure (by sequence matching and domain-level clustering) highlights the substantial duplication in each genome (approximately 30% to 50%), with the degree of duplication following similar patterns in all three. Many sequence families are shared among the genomes, with the degree of overlap between any two genomes being roughly similar. In total, the three genomes contain 148 of the approximately 300 known protein folds. Forty-five of these 148 that are present in all three genomes are especially enriched in mixed super-secondary structures (alpha/beta). Moreover, the five most common of these 45 (the "top-5") have a remarkably similar super-secondary structure architecture, containing a central sheet of parallel strands with helices packed onto at least one face and beta-alpha-beta connections between adjacent strands. These most basic molecular parts, which, presumably, were present in the last common ancestor to the three Kingdoms, include the TIM-barrel, Rossmann, flavodoxin, thiamin-binding, and P-loop-hydrolase folds.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Archaea / genetics*
  • Bacteria / genetics*
  • Biological Evolution
  • Eukaryotic Cells / physiology*
  • Genome*
  • Models, Biological
  • Models, Molecular
  • Multigene Family
  • Protein Conformation
  • Protein Folding
  • Proteins / chemistry*
  • Proteins / genetics
  • Sequence Homology, Amino Acid
  • Sequence Homology, Nucleic Acid

Substances

  • Proteins