A new distance measure for comparing sequence profiles based on path lengths along an entropy surface

Bioinformatics. 2002;18 Suppl 2:S44-53. doi: 10.1093/bioinformatics/18.suppl_2.s44.

Abstract

We describe a new distance measure for comparing DNA sequence profiles. For this measure, columns in a multiple alignment are treated as character frequency vectors (sum of the frequencies equal to one). The distance between two vectors is based on minimum path length along an entropy surface. Path length is estimated using a random graph generated on the entropy surface and Dijkstra's algorithm for all shortest paths to a source. We use the new distance measure to analyze similarities within familes of tandem repeats in the C. elegans genome and show that this new measure gives more accurate refinement of family relationships than a method based on comparing consensus sequences.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Caenorhabditis elegans / genetics*
  • Chromosome Mapping / methods*
  • Consensus Sequence / genetics
  • Entropy
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Tandem Repeat Sequences / genetics*