Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA

Pac Symp Biocomput. 2018;23:512-523.

Abstract

Long intergenic non-coding RNAs have been shown to play important roles in cancer. However, because lincRNAs are a relatively new class of RNAs compared to protein-coding mRNAs, the mutational landscape of lincRNAs has not been as extensively studied. Here we characterize expressed somatic nucleotide variants within lincRNAs using 12 cancer RNA-Seq datasets in TCGA. We build machine-learning models to discriminate somatic variants from germline variants within lincRNA regions (AUC 0.987). We build another model to differentiate lincRNA somatic mutations from background regions (AUC 0.72) and find several molecular features that are strongly associated with lincRNA mutations, including copy number variation, conservation, substitution type and histone marker features.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Conserved Sequence
  • DNA Copy Number Variations
  • Databases, Nucleic Acid / statistics & numerical data
  • Female
  • Genetic Variation
  • Germ-Line Mutation
  • Histones / genetics
  • Histones / metabolism
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Machine Learning
  • Male
  • Models, Genetic
  • Models, Statistical
  • Mutation
  • Neoplasms / genetics*
  • Neoplasms / metabolism
  • Neural Networks, Computer
  • Nonlinear Dynamics
  • RNA, Long Noncoding / genetics*
  • RNA, Neoplasm / genetics*
  • Sequence Analysis, RNA / statistics & numerical data

Substances

  • Histones
  • RNA, Long Noncoding
  • RNA, Neoplasm