Exploring microRNA Regulation of Cancer with Context-Aware Deep Cancer Classifier

Pac Symp Biocomput. 2019:24:160-171.

Abstract

Background: MicroRNAs (miRNAs) are small, non-coding RNA that regulate gene expression through post-transcriptional silencing. Differential expression observed in miRNAs, combined with advancements in deep learning (DL), have the potential to improve cancer classification by modelling non-linear miRNA-phenotype associations. We propose a novel miRNA-based deep cancer classifier (DCC) incorporating genomic and hierarchical tissue annotation, capable of accurately predicting the presence of cancer in wide range of human tissues.

Methods: miRNA expression profiles were analyzed for 1746 neoplastic and 3871 normal samples, across 26 types of cancer involving six organ sub-structures and 68 cell types. miRNAs were ranked and filtered using a specificity score representing their information content in relation to neoplasticity, incorporating 3 levels of hierarchical biological annotation. A DL architecture composed of stacked autoencoders (AE) and a multi-layer perceptron (MLP) was trained to predict neoplasticity using 497 abundant and informative miRNAs. Additional DCCs were trained using expression of miRNA cistrons and sequence families, and combined as a diagnostic ensemble. Important miRNAs were identified using backpropagation, and analyzed in Cytoscape using iCTNet and BiNGO.

Results: Nested four-fold cross-validation was used to assess the performance of the DL model. The model achieved an accuracy, AUC/ROC, sensitivity, and specificity of 94.73%, 98.6%, 95.1%, and 94.3%, respectively.

Conclusion: Deep autoencoder networks are a powerful tool for modelling complex miRNA-phenotype associations in cancer. The proposed DCC improves classification accuracy by learning from the biological context of both samples and miRNAs, using anatomical and genomic annotation. Analyzing the deep structure of DCCs with backpropagation can also facilitate biological discovery, by performing gene ontology searches on the most highly significant features.

MeSH terms

  • Computational Biology
  • Databases, Nucleic Acid / statistics & numerical data
  • Deep Learning*
  • Diagnosis, Computer-Assisted / methods
  • Female
  • Gene Expression Profiling / statistics & numerical data
  • Gene Expression Regulation, Neoplastic
  • Gene Ontology
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Male
  • MicroRNAs / classification
  • MicroRNAs / genetics*
  • Molecular Sequence Annotation
  • Neoplasms / classification
  • Neoplasms / diagnosis
  • Neoplasms / genetics*
  • Neural Networks, Computer
  • Sequence Analysis, RNA

Substances

  • MicroRNAs