TrieAMD: a scalable and efficient apriori motif discovery approach

Isra Al-Turaiki; Ghada Badr; Hassan Mathkour

doi:10.1504/ijdmb.2015.070833

TrieAMD: a scalable and efficient apriori motif discovery approach

Int J Data Min Bioinform. 2015;13(1):13-30. doi: 10.1504/ijdmb.2015.070833.

Authors

Isra Al-Turaiki, Ghada Badr, Hassan Mathkour

PMID: 26529905
DOI: 10.1504/ijdmb.2015.070833

Abstract

Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori algorithm and use the Trie data structure to discover motifs. We propose several modifications so that we can adapt the classic Apriori to our problem. Experiments are conducted on Tompa's benchmark to investigate the performance of our proposed algorithm, the Trie-based Apriori Motif Discovery (TrieAMD). Results show that our algorithm outperforms all of the tested tools on real datasets for the average sensitivity measure, which means that our approach is able to discover more motifs. In terms of specificity, the performance of our algorithm is comparable to the other tools. The results also confirm both linear time and linear space scalability of the algorithm.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Amino Acid Motifs
Data Mining / methods*
Databases, Protein*
Proteins / chemistry
Proteins / genetics*
Sequence Analysis, Protein / methods*
Software*

Substances

Proteins