Statistical properties of open reading frames in complete genome sequences

Comput Chem. 1999 Jun 15;23(3-4):283-301. doi: 10.1016/s0097-8485(99)00014-5.


Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipf's plots), and spatial densities. The issue of the influence of CG% on the size distribution is addressed. When yeast chromosomes are compared with archaeal and eubacterial genomes, they tend to have more long open reading frames. There is little or no evidence to reject the null hypothesis that open reading frames on six different reading frames and two strands distribute similarly. A topic of current interest, the base composition asymmetry in open reading frames between the two strands, is studied using regression analysis. The base composition asymmetry at three codon positions is analyzed separately. It was shown in these genome sequences that the first codon position is G- and A-rich (i.e. purine-rich); there is a co-existence of A- and T-rich branches at the second codon position; and the third codon position is weakly T-rich.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Sequence
  • Codon
  • Genetic Code
  • Genome*
  • Models, Genetic
  • Open Reading Frames*
  • Software


  • Codon