Literature Mining of Disease Associated Noncoding RNA in the Omics Era

Molecules. 2022 Jul 23;27(15):4710. doi: 10.3390/molecules27154710.

Abstract

Noncoding RNAs (ncRNA) are transcripts without protein-coding potential that play fundamental regulatory roles in diverse cellular processes and diseases. The application of deep sequencing experiments in ncRNA research have generated massive omics datasets, which require rapid examination, interpretation and validation based on exiting knowledge resources. Thus, text-mining methods have been increasingly adapted for automatic extraction of relations between an ncRNA and its target or a disease condition from biomedical literature. These bioinformatics tools can also assist in more complex research, such as database curation of candidate ncRNAs and hypothesis generation with respect to pathophysiological mechanisms. In this concise review, we first introduced basic concepts and workflow of literature mining systems. Then, we compared available bioinformatics tools tailored for ncRNA studies, including the tasks, applicability, and limitations. Their powerful utilities and flexibility are demonstrated by examples in a variety of diseases, such as Alzheimer's disease, atherosclerosis and cancers. Finally, we outlined several challenges from the viewpoints of both system developers and end users. We concluded that the application of text-mining techniques will booster disease-associated ncRNA discoveries in the biomedical literature and enable integrative biology in the current omics era.

Keywords: biomedical literature mining; deep sequencing; ncRNA; omics.

Publication types

  • Review

MeSH terms

  • Computational Biology / methods
  • Data Mining* / methods
  • Publications
  • RNA, Untranslated* / genetics

Substances

  • RNA, Untranslated

Grants and funding

This research received no external funding.