Clustering of gene ontology terms in genomes

Gene. 2014 Oct 25;550(2):155-64. doi: 10.1016/j.gene.2014.06.060. Epub 2014 Jul 1.


Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them.

Keywords: Bioinformatics; Computational biology; Genomics; Systems biology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arabidopsis / genetics
  • Caenorhabditis elegans / genetics
  • Chromosome Mapping
  • Cluster Analysis
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Drosophila melanogaster / genetics
  • Escherichia coli K12 / genetics
  • Gene Ontology / statistics & numerical data*
  • Genome*
  • Humans
  • Mice
  • Multigene Family*
  • Saccharomyces cerevisiae / genetics