A linguistic representation of the regulation of transcription initiation. I. An ordered array of complex symbols with distinctive features

Biosystems. 1993;29(2-3):87-104. doi: 10.1016/0303-2647(93)90086-r.


The inadequacy of context-free grammars in the description of regulatory information contained in DNA gave the formal justification for a linguistic approach to the study of gene regulation. Based on that result, we have initiated a linguistic formalization of the regulatory arrays of 107 sigma 70 E. coli promoters. The complete sequences of promoter (Pr), operator (Op) and activator binding sites (I) have previously been identified as the smallest elements, or categories, for a combinatorial analysis of the range of transcription initiation of sigma 70 promoters. These categories are conceptually equivalent to phonemes of natural language. Several features associated with these categories are required in a complete description of regulatory arrays of promoters. We have to select the best way to describe the properties that are pertinent for the description of such regulatory regions. In this paper we define distinctive features of regulatory regions based on the following criteria: identification of subclasses of substitutable elements, simplicity, selection of the most directly related information, and distinction of one array among the whole set of promoters. Alternative ways to represent distances in between regulatory sites are discussed, permitting, together with a principle of precedence, the identification of an ordered set of complex symbols as a unique representation for a promoter and its associated regulatory sites. In the accompanying paper additional distinctive features of promoters and regulatory sites are identified.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Binding Sites / genetics
  • DNA, Bacterial / genetics
  • Escherichia coli / genetics
  • Gene Expression Regulation, Bacterial
  • Linguistics
  • Models, Genetic*
  • Operator Regions, Genetic
  • Promoter Regions, Genetic
  • Systems Theory
  • Transcription, Genetic*


  • DNA, Bacterial