Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes

Genome Res. 2002 Sep;12(9):1408-17. doi: 10.1101/gr.255002.

Abstract

Evolutionarily conserved noncoding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. However, detecting and visualizing compositionally similar cis-element clusters in the context of conserved sequences is challenging. We have explored potential solutions and developed an algorithm and visualization method that combines the results of conserved sequence analyses (BLASTZ) with those of transcription factor binding site analyses (MatInspector) (http://trafac.chmcc.org). We define hits as the density of co-occurring cis-element transcription factor (TF)-binding sites measured within a 200-bp moving average window through phylogenetically conserved regions. The results are depicted as a Regulogram, in which the hit count is plotted as a function of position within each of the two genomic regions of the aligned orthologs. Within a high-scoring region, the relative arrangement of shared cis-elements within compositionally similar TF-binding site clusters is depicted in a Trafacgram. On the basis of analyses of several training data sets, the approach also allows for the detection of similarities in composition and relative arrangement of cis-element clusters within nonorthologous genes, promoters, and enhancers that exhibit coordinate regulatory properties. Known functional regulatory regions of nonorthologous and less-conserved orthologous genes frequently showed cis-element shuffling, demonstrating that compositional similarity can be more sensitive than sequence similarity. These results show that combining sequence similarity with cis-element compositional similarity provides a powerful aid for the identification of potential control regions.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Base Composition / genetics
  • Binding Sites / physiology
  • Brain Chemistry / genetics
  • CD4 Antigens / genetics
  • Conserved Sequence / genetics
  • DNA Helicases*
  • DNA-Binding Proteins / genetics
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Humans
  • Intestinal Mucosa / metabolism
  • Intestines / chemistry
  • Lung / chemistry
  • Lung / metabolism
  • Mice
  • Multigene Family / genetics*
  • Organ Specificity / genetics
  • Promoter Regions, Genetic / genetics
  • Proteins / genetics
  • RNA, Untranslated / genetics
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Software
  • Transcription Factors / genetics
  • X-ray Repair Cross Complementing Protein 1
  • Xeroderma Pigmentosum Group D Protein

Substances

  • CD4 Antigens
  • DNA-Binding Proteins
  • Proteins
  • RNA, Untranslated
  • Transcription Factors
  • X-ray Repair Cross Complementing Protein 1
  • DNA Helicases
  • Xeroderma Pigmentosum Group D Protein
  • ERCC2 protein, human
  • Ercc2 protein, mouse