Comprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancer

Mol Syst Biol. 2015 Aug 7;11(8):826. doi: 10.15252/msb.156172.

Abstract

Crucial parts of the genome including genes encoding microRNAs and noncoding RNAs went unnoticed for years, and even now, despite extensive annotation and assembly of the human genome, RNA-sequencing continues to yield millions of unmappable and thus uncharacterized reads. Here, we examined > 300 billion reads from 536 normal donors and 1,873 patients encompassing 21 cancer types, identified ~300 million such uncharacterized reads, and using a distinctive approach de novo assembled 2,550 novel human transcripts, which mainly represent long noncoding RNAs. Of these, 230 exhibited relatively specific expression or non-expression in certain cancer types, making them potential markers for those cancers, whereas 183 exhibited tissue specificity. Moreover, we used lentiviral-mediated expression of three selected transcripts that had higher expression in normal than in cancer patients and found that each inhibited the growth of HepG2 cells. Our analysis provides a comprehensive and unbiased resource of unmapped human transcripts and reveals their associations with specific cancers, providing potentially important new genes for therapeutic targeting.

Keywords: cancer‐associated transcripts; long noncoding RNAs; unmapped sequencing reads.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Animals
  • Base Sequence
  • Biomarkers, Tumor / genetics*
  • Cell Line, Tumor
  • Cell Proliferation / genetics
  • Chromosome Mapping
  • Gene Expression Profiling
  • Genome / genetics
  • Gorilla gorilla / genetics
  • HEK293 Cells
  • Hep G2 Cells
  • High-Throughput Nucleotide Sequencing
  • Histones / genetics
  • Humans
  • MicroRNAs / genetics*
  • Molecular Sequence Annotation
  • Neoplasms / genetics*
  • Pan troglodytes / genetics
  • RNA, Long Noncoding / genetics*
  • Sequence Analysis, RNA

Substances

  • Biomarkers, Tumor
  • Histones
  • MicroRNAs
  • RNA, Long Noncoding