Finding motifs in the twilight zone

Bioinformatics. 2002 Oct;18(10):1374-81. doi: 10.1093/bioinformatics/18.10.1374.

Abstract

Motivation: Gene activity is often affected by binding transcription factors to short fragments in DNA sequences called motifs. Identification of subtle regulatory motifs in a DNA sequence is a difficult pattern recognition problem. In this paper we design a new motif finding algorithm that can detect very subtle motifs.

Results: We introduce the notion of a multiprofile and use it for finding subtle motifs in DNA sequences. Multiprofiles generalize the notion of a profile and allow one to detect subtle patterns that escape detection by the standard profiles. Our MULTIPROFILER algorithm outperforms other leading motif finding algorithms in a number of synthetic models. Moreover, it can be shown that in some previously studied motif models, MULTIPROFILER is capable of pushing the performance envelope to its theoretical limits.

Availability: http://www-cse.ucsd.edu/groups/bioinformatics/software.html

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics*
  • Base Sequence
  • Benchmarking
  • Consensus Sequence / genetics
  • DNA / genetics
  • DNA-Binding Proteins / genetics
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Molecular Sequence Data
  • Promoter Regions, Genetic / genetics
  • Quality Control
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae / metabolism
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA-Binding Proteins
  • DNA