The generalised k-Truncated Suffix Tree for time-and space-efficient searches in multiple DNA or protein sequences

Marcel H Schulz; Sebastian Bauer; Peter N Robinson

doi:10.1504/IJBRA.2008.017165

The generalised k-Truncated Suffix Tree for time-and space-efficient searches in multiple DNA or protein sequences

Int J Bioinform Res Appl. 2008;4(1):81-95. doi: 10.1504/IJBRA.2008.017165.

Authors

Marcel H Schulz¹, Sebastian Bauer, Peter N Robinson

Affiliation

¹ Institute fur Medizinische Genetik, Charite Universitatsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.marcel.schulz@molgen.mpg.de

PMID: 18283030
DOI: 10.1504/IJBRA.2008.017165

Abstract

Efficient searching for specific subsequences in a set of longer sequences is an important component of many bioinformatics algorithms. Generalised suffix trees and suffix arrays allow searches for a pattern of length n in time proportional to n independent of the length of the sequences, and are thus attractive for a variety of applications. Here, we present an algorithm termed the generalised k-Truncated Suffix Tree (kTST), that represents an adaption of Ukkonen's linear-time suffix tree construction algorithm. The kTST algorithm creates a k-deep tree in linear time that allows rapid searches for short patterns of length of up to k characters. The kTST can offer advantages in computational time and memory usage for searches for short sequences in DNA or protein sequences compared to other suffix-based algorithms.

MeSH terms

Algorithms*
Amino Acid Sequence
Base Sequence
Molecular Sequence Data
Pattern Recognition, Automated / methods*
Sequence Alignment / methods*
Sequence Analysis, DNA / methods*
Sequence Analysis, Protein / methods*