A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters

Nucleic Acids Res. 2000 Oct 15;28(20):4021-8. doi: 10.1093/nar/28.20.4021.

Abstract

The availability of computerized knowledge on biochemical pathways in the KEGG database opens new opportunities for developing computational methods to characterize and understand higher level functions of complete genomes. Our approach is based on the concept of graphs; for example, the genome is a graph with genes as nodes and the pathway is another graph with gene products as nodes. We have developed a simple method for graph comparison to identify local similarities, termed correlated clusters, between two graphs, which allows gaps and mismatches of nodes and edges and is especially suitable for detecting biological features. The method was applied to a comparison of the complete genomes of 10 microorganisms and the KEGG metabolic pathways, which revealed, not surprisingly, a tendency for formation of correlated clusters called FRECs (functionally related enzyme clusters). However, this tendency varied considerably depending on the organism. The relative number of enzymes in FRECs was close to 50% for Bacillus subtilis and Escherichia coli, but was <10% for SYNECHOCYSTIS: and Saccharomyces cerevisiae. The FRECs collection is reorganized into a collection of ortholog group tables in KEGG, which represents conserved pathway motifs with the information about gene clusters in all the completely sequenced genomes.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Automation
  • Computational Biology / methods*
  • Conserved Sequence / genetics*
  • Databases, Factual
  • Enzymes / genetics*
  • Enzymes / metabolism*
  • Escherichia coli / enzymology
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Genome*
  • Genome, Archaeal
  • Genome, Bacterial
  • Genome, Fungal
  • Operon / genetics
  • Peptidoglycan / biosynthesis
  • Sequence Homology
  • Statistics as Topic

Substances

  • Enzymes
  • Peptidoglycan