Automatic annotation of genomic regulatory sequences by searching for composite clusters

Pac Symp Biocomput. 2002:187-98. doi: 10.1142/9789812799623_0018.

Abstract

A new method was developed for revealing of composite clusters of cis-elements in promoters of eukaryotic genes that are functionally related or coexpressed. A software system "ClusterScan" have been created that enables: (i) to train system on representative samples of promoters to reveal cis-elements that tend to cluster, (ii) to train system on a number of samples of functionally related promoters to identify functionally coupled transcription factors; (iii) to provide tools for searching of this clusters in genomic sequences to identify and functionally characterize regulatory regions in genome. A number of training samples of different functional and structural groups of promoters were analysed. Search for composite clusters in human chromosomes 21 and 22 reveals a number of interesting examples. Finally, a decision tree system was constructed to classify promoters of several functionally related gene groups. The decision tree system enables to identify new promoters and computationally predict their possible function.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Automation
  • Cluster Analysis*
  • Databases, Factual
  • Decision Trees
  • Genomics*
  • Mammals / genetics
  • Promoter Regions, Genetic*
  • Regulatory Sequences, Nucleic Acid*
  • Software
  • Transcription Factors / genetics

Substances

  • Transcription Factors