Computational identification of new structured cis-regulatory elements in the 3'-untranslated region of human protein coding genes

Nucleic Acids Res. 2012 Oct;40(18):8862-73. doi: 10.1093/nar/gks684. Epub 2012 Jul 20.

Abstract

Messenger ribonucleic acids (RNAs) contain a large number of cis-regulatory RNA elements that function in many types of post-transcriptional regulation. These cis-regulatory elements are often characterized by conserved structures and/or sequences. Although some classes are well known, given the wide range of RNA-interacting proteins in eukaryotes, it is likely that many new classes of cis-regulatory elements are yet to be discovered. An approach to this is to use computational methods that have the advantage of analysing genomic data, particularly comparative data on a large scale. In this study, a set of structural discovery algorithms was applied followed by support vector machine (SVM) classification. We trained a new classification model (CisRNA-SVM) on a set of known structured cis-regulatory elements from 3'-untranslated regions (UTRs) and successfully distinguished these and groups of cis-regulatory elements not been strained on from control genomic and shuffled sequences. The new method outperformed previous methods in classification of cis-regulatory RNA elements. This model was then used to predict new elements from cross-species conserved regions of human 3'-UTRs. Clustering of these elements identified new classes of potential cis-regulatory elements. The model, training and testing sets and novel human predictions are available at: http://mRNA.otago.ac.nz/CisRNA-SVM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions*
  • Algorithms
  • Genomics / methods*
  • Humans
  • Nucleic Acid Conformation
  • Proteins / genetics
  • RNA / chemistry
  • Regulatory Sequences, Ribonucleic Acid*
  • Support Vector Machine

Substances

  • 3' Untranslated Regions
  • Proteins
  • Regulatory Sequences, Ribonucleic Acid
  • RNA