Applications and statistics for multiple high-scoring segments in molecular sequences

Proc Natl Acad Sci U S A. 1993 Jun 15;90(12):5873-7. doi: 10.1073/pnas.90.12.5873.


Score-based measures of molecular-sequence features provide versatile aids for the study of proteins and DNA. They are used by many sequence data base search programs, as well as for identifying distinctive properties of single sequences. For any such measure, it is important to know what can be expected to occur purely by chance. The statistical distribution of high-scoring segments has been described elsewhere. However, molecular sequences will frequently yield several high-scoring segments for which some combined assessment is in order. This paper describes the statistical distribution for the sum of the scores of multiple high-scoring segments and illustrates its application to the identification of possible transmembrane segments and the evaluation of sequence similarity.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence*
  • Animals
  • Antithrombin III / genetics
  • Base Sequence*
  • Biological Evolution
  • Chickens
  • DNA*
  • Drosophila / genetics
  • Drosophila Proteins*
  • Eye Proteins / genetics
  • Fowlpox virus / genetics
  • Humans
  • Membrane Glycoproteins / genetics
  • Molecular Sequence Data
  • Probability
  • Proteins*
  • Receptor Protein-Tyrosine Kinases*
  • Receptors, Cell Surface / genetics
  • Receptors, Serotonin / genetics
  • Sequence Analysis*
  • Sequence Homology, Amino Acid*


  • Drosophila Proteins
  • Eye Proteins
  • Membrane Glycoproteins
  • Proteins
  • Receptors, Cell Surface
  • Receptors, Serotonin
  • Antithrombin III
  • DNA
  • Receptor Protein-Tyrosine Kinases
  • sev protein, Drosophila