A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences

J Biomed Inform. 2008 Feb;41(1):65-81. doi: 10.1016/j.jbi.2007.05.010. Epub 2007 Jun 27.

Abstract

A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Database Management Systems*
  • Databases, Protein*
  • Information Storage and Retrieval / methods
  • Molecular Sequence Data
  • Natural Language Processing
  • Pattern Recognition, Automated / methods
  • Proteins / chemistry*
  • Proteins / classification*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid

Substances

  • Proteins