Plant noncoding RNA gene discovery by "single-genome comparative genomics"

RNA. 2011 Mar;17(3):390-400. doi: 10.1261/rna.2426511. Epub 2011 Jan 10.

Abstract

Plant genomes have undergone multiple rounds of duplications that contributed massively to the growth of gene families. The structure of resulting families has been studied in depth for protein-coding genes. However, little is known about the impact of duplications on noncoding RNA (ncRNA) genes. Here we perform a systematic analysis of duplicated regions in the rice genome in search of such ncRNA repeats. We observe that, just like their protein counterparts, most ncRNA genes have undergone multiple duplications that left visible sequence conservation footprints. The extent of ncRNA gene duplication in plants is such that these sequence footprints can be exploited for the discovery of novel ncRNA gene families on a large scale. We developed an SVM model that is able to retrieve likely ncRNA candidates among the 100,000+ repeat families in the rice genome, with a reasonably low false-positive discovery rate. Among the nearly 4000 ncRNA families predicted by this means, only 90 correspond to putative snoRNA or miRNA families. About half of the remaining families are classified as structured RNAs. New candidate ncRNAs are particularly enriched in UTR and intronic regions. Interestingly, 89% of the putative ncRNA families do not produce a detectable signal when their sequences are compared to another grass genome such as maize. Our results show that a large fraction of rice ncRNA genes are present in multiple copies and are species-specific or of recent origin. Intragenome comparison is a unique and potent source for the computational annotation of this major class of ncRNA.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Comparative Genomic Hybridization*
  • Computational Biology*
  • Genes, Plant / genetics*
  • Genome, Plant*
  • Molecular Sequence Data
  • Multigene Family
  • Nucleic Acid Conformation
  • Plants / genetics*
  • Polymerase Chain Reaction
  • RNA, Plant / genetics*
  • RNA, Untranslated / chemistry
  • RNA, Untranslated / genetics*
  • Sequence Homology, Nucleic Acid

Substances

  • RNA, Plant
  • RNA, Untranslated