Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 2, 28-36

Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers

Affiliations
  • PMID: 7584402

Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers

T L Bailey et al. Proc Int Conf Intell Syst Mol Biol.

Abstract

The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs. The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input. It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset.

Similar articles

  • Discovering Novel Sequence Motifs With MEME
    TL Bailey. Curr Protoc Bioinformatics Chapter 2, Unit 2.4. PMID 18792935.
    This unit illustrates how to use MEME to discover motifs in a group of related nucleotide or peptide sequences. A MEME motif is a sequence pattern that occurs repeatedly …
  • Discriminative Motif Discovery in DNA and Protein Sequences Using the DEME Algorithm
    E Redhead et al. BMC Bioinformatics 8, 385. PMID 17937785.
    Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in t …
  • A Sequential Method for Discovering Probabilistic Motifs in Proteins
    K Blekas et al. Methods Inf Med 43 (1), 9-12. PMID 15026827.
    The proposed greedy algorithm constitutes a promising approach for discovering multiple probabilistic motifs in biological sequences. By using an effective incremental mi …
  • Discovering Sequence Motifs
    TL Bailey. Methods Mol Biol 452, 231-51. PMID 18566768. - Review
    Sequence motif discovery algorithms are an important part of the computational biologist's toolkit. The purpose of motif discovery is to discover patterns in biopolymer ( …
  • Discovering Sequence Motifs
    TL Bailey. Methods Mol Biol 395, 271-92. PMID 17993680. - Review
    Sequence motif discovery algorithms are an important part of the computational biologist's toolkit. The purpose of motif discovery is to discover patterns in biopolymer ( …
See all similar articles

Cited by 1,842 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback