A multivariate representation and analysis of DNA sequence data

Acta Chem Scand (Cph). 1991 Feb;45(2):186-92. doi: 10.3891/acta.chem.scand.45-0186.

Abstract

A new way to represent and analyze DNA sequence data is described. This approach complements methods currently used, in that it allows the systematic part of the variation between different sequences to be modeled. This can prove as informative as absence of variation (homology), which is the most widely used criterion for comparing sequence data. A multivariate sequence-activity model (SAM), for DNA-promoter sequences is presented, by which the relative promoter strength is modeled in terms of the primary DNA-sequence. The model is shown to have a good predictive capability. The coefficients from the model are interpreted, and used to design new structures predicted to be strong promoters in the system investigated. The approach described is also applicable to other kinds of sequence data, e.g. RNAs, proteins or peptides.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • DNA, Bacterial / genetics*
  • Escherichia coli / genetics
  • Molecular Sequence Data
  • Multivariate Analysis
  • Promoter Regions, Genetic*

Substances

  • DNA, Bacterial