tuple_plot: fast pairwise nucleotide sequence comparison with noise suppression

Bioinformatics. 2006 Aug 1;22(15):1917-8. doi: 10.1093/bioinformatics/btl277. Epub 2006 Jun 9.

Abstract

Summary: The program tuple_plot identifies and visualizes local similarities between two genomic sequences, typically 100 kb or longer, by applying the well-known dotplot principle. A dictionary of sequence words built from the input sequences serves to construct a task-specific expectancy model that is used to attribute significance values to pairwise word hits. The dictionary-based approach allows fast computation, the computation time scaling to O(N log N), depending on the size of the input sequences. The proposed scoring scheme appreciably increases the signal-to-noise ratio and may help to improve other word-based sequence comparison approaches.

Availability: tuple_plot is available at http://genome.fli-leibniz.de/software.html and may be used under GNU public license.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artifacts
  • Base Sequence
  • Chromosome Mapping / methods*
  • Models, Genetic
  • Models, Statistical
  • Molecular Sequence Data
  • Nucleotides / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Software*
  • Stochastic Processes

Substances

  • Nucleotides