An approach to identify over-represented cis-elements in related sequences

Nucleic Acids Res. 2003 Apr 1;31(7):1995-2005. doi: 10.1093/nar/gkg287.

Abstract

Computational identification of transcription factor binding sites is an important research area of computational biology. Positional weight matrix (PWM) is a model to describe the sequence pattern of binding sites. Usually, transcription factor binding sites prediction methods based on PWMs require user-defined thresholds. The arbitrary threshold and also the relatively low specificity of the algorithm prevent the result of such an analysis from being properly interpreted. In this study, a method was developed to identify over-represented cis-elements with PWM-based similarity scores. Three sets of closely related promoters were analyzed, and only over- represented motifs with high PWM similarity scores were reported. The thresholds to evaluate the similarity scores to the PWMs of putative transcription factors binding sites can also be automatically determined during the analysis, which can also be used in further research with the same PWMs. The online program is available on the website: http://www.bioinfo.tsinghua.edu.cn/- zhengjsh/OTFBS/.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Actins / genetics
  • Animals
  • Binding Sites / genetics
  • DNA / genetics
  • DNA / metabolism
  • Hemoglobins / genetics
  • Humans
  • Interferons / genetics
  • Promoter Regions, Genetic / genetics
  • Protein Binding
  • Software*
  • Transcription Factors / metabolism*

Substances

  • Actins
  • Hemoglobins
  • Transcription Factors
  • DNA
  • Interferons