A computational approach to identify genes for functional RNAs in genomic sequences

Nucleic Acids Res. 2001 Oct 1;29(19):3928-38. doi: 10.1093/nar/29.19.3928.

Abstract

Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing experiments for bacteria and 90-99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods*
  • Escherichia coli / genetics
  • Forecasting
  • Genes, Archaeal*
  • Genes, Bacterial*
  • Genome, Archaeal
  • Genome, Bacterial
  • Neural Networks, Computer
  • RNA, Messenger / genetics
  • RNA, Untranslated / genetics*

Substances

  • RNA, Messenger
  • RNA, Untranslated