A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Bioinformatics. 2004 Feb 12;20(3):381-8. doi: 10.1093/bioinformatics/btg420. Epub 2004 Jan 22.

Abstract

Motivation: With the advent of DNA microarray technologies, the parallel quantification of genome-wide transcriptions has been a great opportunity to systematically understand the complicated biological phenomena. Amidst the enthusiastic investigations into the intricate gene expression data, clustering methods have been the useful tools to uncover the meaningful patterns hidden in those data. The mathematical techniques, however, entirely based on the numerical expression data, do not show biologically relevant information on the clustering results.

Results: We present a novel methodology for biological interpretation of gene clusters. Our graph theoretic algorithm extracts common biological attributes of the genes within a cluster or a group of interest through the modified structure of gene ontology (GO) called GO tree. After genes are annotated with GO terms, the hierarchical nature of GO terms is used to find the representative biological meanings of the gene clusters. In addition, the biological significance of gene clusters can be assessed quantitatively by defining a distance function on the GO tree. Our approach has a complementary meaning to many statistical clustering techniques; we can see clustering problems from a different viewpoint by use of biological ontology. We applied this algorithm to the well-known data set and successfully obtained the biological features of the gene clusters with the quantitative biological assessment of clustering quality through GO Biological Process.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Database Management Systems
  • Databases, Genetic*
  • Gene Expression Profiling / methods*
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Pattern Recognition, Automated
  • Proteins / classification*
  • Proteins / genetics*
  • Reproducibility of Results
  • Saccharomyces cerevisiae Proteins / classification
  • Saccharomyces cerevisiae Proteins / genetics
  • Sensitivity and Specificity

Substances

  • Proteins
  • Saccharomyces cerevisiae Proteins