Pattern recognition and probabilistic measures in alignment-free sequence analysis

Isabel Schwende; Tuan D Pham

doi:10.1093/bib/bbt070

Pattern recognition and probabilistic measures in alignment-free sequence analysis

Brief Bioinform. 2014 May;15(3):354-68. doi: 10.1093/bib/bbt070. Epub 2013 Oct 3.

Authors

Isabel Schwende¹, Tuan D Pham

Affiliation

¹ PhD, Aizu Research Cluster for Medical Informatics and Engineering (ARC-Medical), Research Center for Advanced Information Science and Technology (CAIST), The University of Aizu, Aizuwakamatsu, Fukushima 965-8580, Japan. tdpham@u-aizu.ac.jp.

PMID: 24096012
DOI: 10.1093/bib/bbt070

Abstract

With the massive production of genomic and proteomic data, the number of available biological sequences in databases has reached a level that is not feasible anymore for exact alignments even when just a fraction of all sequences is used. To overcome this inevitable time complexity, ultrafast alignment-free methods are studied. Within the past two decades, a broad variety of nonalignment methods have been proposed including dissimilarity measures on classical representations of sequences like k-words or Markov models. Furthermore, articles were published that describe distance measures on alternative representations such as compression complexity, spectral time series or chaos game representation. However, alignments are still the standard method for real world applications in biological sequence analysis, and the time efficient alignment-free approaches are usually applied in cases when the accustomed algorithms turn out to fail or be too inconvenient.

Keywords: alignment-free; distance measures; distortion measures; pattern classification; sequence comparison; signal processing.

Publication types

Review

MeSH terms

Computational Biology / methods*
Genomics / statistics & numerical data
Markov Chains
Models, Statistical
Pattern Recognition, Automated / methods*
Phylogeny
Proteomics / statistics & numerical data
Sequence Alignment
Sequence Analysis / methods*
Sequence Analysis / statistics & numerical data
Software