Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping

Nucleic Acids Res. 2000 Oct 15;28(20):4029-36. doi: 10.1093/nar/28.20.4029.

Abstract

We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P: was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Automation
  • Computational Biology / methods*
  • Conserved Sequence / genetics*
  • Databases, Factual
  • Gene Order / genetics
  • Genes, Archaeal / genetics
  • Genes, Bacterial / genetics
  • Genes, Fungal / genetics
  • Genetic Linkage / genetics
  • Genome*
  • Genomics / methods
  • Multigene Family / genetics*
  • Open Reading Frames / genetics
  • Operon / genetics
  • Phylogeny
  • Probability*
  • Recombination, Genetic / genetics
  • Sequence Alignment / methods*
  • Sequence Homology