Phylogenetic motif detection by expectation-maximization on evolutionary mixtures

Pac Symp Biocomput. 2004;324-35. doi: 10.1142/9789812704856_0031.

Abstract

The preferential conservation of transcription factor binding sites implies that non-coding sequence data from related species will prove a powerful asset to motif discovery. We present a unified probabilistic framework for motif discovery that incorporates evolutionary information. We treat aligned DNA sequence as a mixture of evolutionary models, for motif and background, and, following the example of the MEME program, provide an algorithm to estimate the parameters by Expectation-Maximization. We examine a variety of evolutionary models and show that our approach can take advantage of phylogenic information to avoid false positives and discover motifs upstream of groups of characterized target genes. We compare our method to traditional motif finding on only conserved regions. An implementation will be made available at http://rana.lbl.gov.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Base Sequence
  • Computational Biology*
  • DNA, Fungal / genetics
  • DNA-Binding Proteins / genetics
  • Evolution, Molecular*
  • Fungal Proteins / genetics
  • Likelihood Functions
  • Models, Genetic
  • Models, Statistical
  • Phylogeny*
  • Saccharomyces / genetics
  • Software

Substances

  • DNA, Fungal
  • DNA-Binding Proteins
  • Fungal Proteins