Mining regulatory 5'UTRs from cDNA deep sequencing datasets

Nucleic Acids Res. 2010 Mar;38(5):1504-14. doi: 10.1093/nar/gkp1121. Epub 2009 Dec 7.

Abstract

Regulatory 5' untranslated regions (r5'UTRs) of mRNAs such as riboswitches modulate the expression of genes involved in varied biological processes in both bacteria and eukaryotes. New high-throughput sequencing technologies could provide powerful tools for discovery of novel r5'UTRs, but the size and complexity of the datasets generated by these technologies makes it difficult to differentiate r5'UTRs from the multitude of other types of RNAs detected. Here, we developed and implemented a bioinformatic approach to identify putative r5'UTRs from within large datasets of RNAs recently identified by pyrosequencing of the Vibrio cholerae small transcriptome. This screen yielded only approximately 1% of all non-overlapping RNAs along with 75% of previously annotated r5'UTRs and 69 candidate V. cholerae r5'UTRs. These candidates include several putative functional homologues of diverse r5'UTRs characterized in other species as well as numerous candidates upstream of genes involved in pathways not known to be regulated by r5'UTRs, such as fatty acid oxidation and peptidoglycan catabolism. Two of these novel r5'UTRs were experimentally validated using a GFP reporter-based approach. Our findings suggest that the number and diversity of pathways regulated by r5'UTRs has been underestimated and that deep sequencing-based transcriptomics will be extremely valuable in the search for novel r5'UTRs.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5' Untranslated Regions*
  • Amino Acids / metabolism
  • Base Sequence
  • Conserved Sequence
  • DNA, Complementary / chemistry*
  • Data Mining
  • Down-Regulation
  • Gene Expression Profiling
  • Genes, Reporter
  • Genomics / methods*
  • Green Fluorescent Proteins / analysis
  • Green Fluorescent Proteins / genetics
  • Regulatory Sequences, Ribonucleic Acid*
  • Sequence Analysis, DNA*
  • Vibrio cholerae / genetics*
  • Vibrio cholerae / metabolism

Substances

  • 5' Untranslated Regions
  • Amino Acids
  • DNA, Complementary
  • Regulatory Sequences, Ribonucleic Acid
  • Green Fluorescent Proteins