Identification of real microRNA precursors with a pseudo structure status composition approach

PLoS One. 2015 Mar 30;10(3):e0121501. doi: 10.1371/journal.pone.0121501. eCollection 2015.


Containing about 22 nucleotides, a micro RNA (abbreviated miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Particularly, with the avalanche of RNA sequences generated in the postgenomic age, it is highly desired to develop computational sequence-based methods in this regard. Here two new predictors, called "iMcRNA-PseSSC" and "iMcRNA-ExPseSSC", were proposed for identifying the human pre-microRNAs by incorporating the global or long-range structure-order information using a way quite similar to the pseudo amino acid composition approach. Rigorous cross-validations on a much larger and more stringent newly constructed benchmark dataset showed that the two new predictors (accessible at outperformed or were highly comparable with the best existing predictors in this area.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / genetics
  • Base Sequence
  • Computational Biology / methods
  • Genome, Human / genetics
  • Humans
  • MicroRNAs / genetics*
  • Nucleic Acid Conformation
  • Nucleotides / genetics
  • RNA Precursors / genetics*
  • RNA, Small Untranslated / genetics


  • Amino Acids
  • MicroRNAs
  • Nucleotides
  • RNA Precursors
  • RNA, Small Untranslated

Associated data

  • figshare/10.6084/m9.figshare.1289312

Grant support

This work was supported by the National Natural Science Foundation of China (No. 61300112, 61272383), the Scientific Research Innovation Foundation in Harbin Institute of Technology (Project No. HIT.NSRIF.2013103), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, and State Education Ministry. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of this manuscript.