UNAGI: an automated pipeline for nanopore full-length cDNA sequencing uncovers novel transcripts and isoforms in yeast

Funct Integr Genomics. 2020 Jul;20(4):523-536. doi: 10.1007/s10142-020-00732-1. Epub 2020 Jan 18.

Abstract

Sequencing the entire RNA molecule leads to a better understanding of the transcriptome architecture. SMARTer (Switching Mechanism at 5'-End of RNA Template) is a technology aimed at generating full-length cDNA from low amounts of mRNA for sequencing by short-read sequencers such as those from Illumina. However, short read sequencing such as Illumina technology includes fragmentation that results in bias and information loss. Here, we built a pipeline, UNAGI or UNAnnotated Gene Identifier, to process long reads obtained with nanopore sequencing and compared this pipeline with the standard Illumina pipeline by studying the Saccharomyces cerevisiae transcriptome in full-length cDNA samples generated from two different biological samples: haploid and diploid cells. Additionally, we processed the long reads with another long read tool, FLAIR. Our strand-aware method revealed significant differential gene expression that was masked in Illumina data by antisense transcripts. Our pipeline, UNAGI, outperformed the Illumina pipeline and FLAIR in transcript reconstruction (sensitivity and specificity of 80% and 40% vs. 18% and 34% and 79% and 32%, respectively). Moreover, UNAGI discovered 3877 unannotated transcripts including 1282 intergenic transcripts while the Illumina pipeline discovered only 238 unannotated transcripts. For isoforms profiling, UNAGI also outperformed the Illumina pipeline and FLAIR in terms of sensitivity (91% vs. 82% and 63%, respectively). But the low accuracy of nanopore sequencing led to a closer gap in terms of specificity with Illumina pipeline (70% vs. 63%) and to a huge gap with FLAIR (70% vs 0.02%).

Keywords: Annotation; Differential gene expression; Full-length cDNA; Illumina; Isoforms; Nanopore sequencing; Stranding.

MeSH terms

  • DNA, Complementary / chemistry
  • DNA, Complementary / genetics
  • DNA, Fungal / chemistry
  • DNA, Fungal / genetics
  • Nanopore Sequencing / methods*
  • Ploidies
  • RNA, Messenger / chemistry
  • RNA, Messenger / genetics
  • Saccharomyces cerevisiae
  • Software

Substances

  • DNA, Complementary
  • DNA, Fungal
  • RNA, Messenger