SEAM: a Stochastic EM-type Algorithm for Motif-finding in biopolymer sequences

J Bioinform Comput Biol. 2007 Feb;5(1):47-77. doi: 10.1142/s0219720007002527.

Abstract

Position weight matrix-based statistical modeling for the identification and characterization of motif sites in a set of unaligned biopolymer sequences is presented. This paper describes and implements a new algorithm, the Stochastic EM-type Algorithm for Motif-finding (SEAM), and redesigns and implements the EM-based motif-finding algorithm called deterministic EM (DEM) for comparison with SEAM, its stochastic counterpart. The gold standard example, cyclic adenosine monophosphate receptor protein (CRP) binding sequences, together with other biological sequences, is used to illustrate the performance of the new algorithm and compare it with other popular motif-finding programs. The convergence of the new algorithm is shown by simulation. The in silico experiments using simulated and biological examples illustrate the power and robustness of the new algorithm SEAM in de novo motif discovery.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs
  • Amino Acid Sequence
  • Artificial Intelligence*
  • Binding Sites
  • Biopolymers / chemistry*
  • Data Interpretation, Statistical
  • Likelihood Functions
  • Markov Chains
  • Molecular Sequence Data
  • Protein Binding
  • Proteins / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Software
  • Stochastic Processes

Substances

  • Biopolymers
  • Proteins