Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq

Genome Res. 2025 Apr 14;35(4):1053-1064. doi: 10.1101/gr.279864.124.

Abstract

While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or Pacific Biosciences (PacBio) Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4906 novel loci, represented by 5707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.

MeSH terms

  • Animals
  • Genome*
  • Molecular Sequence Annotation* / methods
  • RNA-Seq* / methods
  • Sequence Analysis, RNA* / methods