Notable challenges posed by long-read sequencing for the study of transcriptional diversity and genome annotation

Genome Res. 2025 Apr 14;35(4):583-592. doi: 10.1101/gr.279865.124.

Abstract

Long-read sequencing (LRS) technologies have revolutionized transcriptomic research by enabling the comprehensive sequencing of full-length transcripts. Using these technologies, researchers have reported tens of thousands of novel transcripts, even in well-annotated genomes, while developing new algorithms and experimental approaches to handle the noisy data. The Long-read RNA-seq Genome Annotation Assessment Project community effort benchmarked LRS methods in transcriptomics and validated many novel, lowly expressed, often times sample-specific transcripts identified by long reads. These molecules represent deviations of the major transcriptional program that were overlooked by short-read sequencing methods but are now captured by the full-length, single-molecule approach. This Perspective discusses the challenges and opportunities associated with LRS' capacity to unravel this fraction of the transcriptome, in terms of both transcriptome biology and genome annotation. For transcriptome biology, we need to develop novel experimental and computational methods to effectively differentiate technology errors from rare but real molecules. For genome annotation, we must agree on the strategy to capture molecular variability while still defining reference annotations that are useful for the genomics community.

Publication types

  • Review

MeSH terms

  • Animals
  • Computational Biology / methods
  • Gene Expression Profiling / methods
  • Genomics* / methods
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Molecular Sequence Annotation* / methods
  • Sequence Analysis, RNA* / methods
  • Transcription, Genetic*
  • Transcriptome*