The value of prior knowledge in discovering motifs with MEME

T L Bailey; C Elkan

The value of prior knowledge in discovering motifs with MEME

Proc Int Conf Intell Syst Mol Biol. 1995:3:21-9.

Authors

T L Bailey¹, C Elkan

Affiliation

¹ Department of Computer Science and Engineering University of California at San Diego, La Jolla 92093-0114, USA.

PMID: 7584439

Abstract

MEME is a tool for discovering motifs in sets of protein or DNA sequences. This paper describes several extensions to MEME which increase its ability to find motifs in a totally unsupervised fashion, but which also allow it to benefit when prior knowledge is available. When no background knowledge is asserted. MEME obtains increased robustness from a method for determining motif widths automatically, and from probabilistic models that allow motifs to be absent in some input sequences. On the other hand, MEME can exploit prior knowledge about a motif being present in all input sequences, about the length of a motif and whether it is a palindrome, and (using Dirichlet mixtures) about expected patterns in individual motif positions. Extensive experiments are reported which support the claim that MEME benefits from, but does not require, background knowledge. The experiments use seven previously studied DNA and protein sequence families and 75 of the protein families documented in the Prosite database of sites and patterns, Release 11.1.

Publication types

Comparative Study
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms*
Amino Acid Sequence
Base Sequence
DNA / chemistry*
Models, Theoretical*
Pattern Recognition, Automated*
Proteins / chemistry*
Reproducibility of Results

Substances

Proteins
DNA

Grants and funding

HG00005/HG/NHGRI NIH HHS/United States