Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals

Nucleic Acids Res. 2000 Feb 15;28(4):1000-10. doi: 10.1093/nar/28.4.1000.

Abstract

The study of a few genes has permitted the identification of three elements that constitute a yeast polyadenyl-ation signal: the efficiency element (EE), the positioning element and the actual site for cleavage and poly-adenyl-ation. In this paper we perform an analysis of oligonucleotide composition on the sequences located downstream of the stop codon of all yeast genes. Several oligonucleotide families appear over-represented with a high significance (referred to herein as 'words'). The family with the highest over-representation includes the oligonucleotides shown experimentally to play a role as EEs. The word with the highest score is TATATA, followed, among others, by a series of single-nucleotide variants (TATGTA, TACATA, TAAATA.) and one-letter shifts (ATATAT). A position analysis reveals that those words have a high preference to be in 3' flanks of yeast genes and there they have a very uneven distribution, with a marked peak around 35 bp after the stop codon. Of the predicted ORFs, 85% show one or more of those sequences. Similar results were obtained using a data set of EST sequences. Other clusters of over-represented words are also detected, namely T- and A-rich signals. Using these results and previously known data we propose a general model for the 3' trailers of yeast mRNAs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Cluster Analysis
  • Expressed Sequence Tags
  • Genome, Fungal*
  • Poly A / genetics*
  • Saccharomyces cerevisiae / genetics*

Substances

  • Poly A