Background: Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3' ends. Most APA occurs within 3' UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization.
Results: APA can be profiled using a number of established computational tools that infer polyadenylation sites from standard, short-read RNA-seq datasets. Here, we benchmarked a number of such tools-TAPAS, QAPA, DaPars2, GETUTR, and APATrap- against 3'-Seq, a specialized RNA-seq protocol that enriches for reads at the 3' ends of genes, and Iso-Seq, a Pacific Biosciences (PacBio) single-molecule full-length RNA-seq method in their ability to identify polyadenylation sites and quantify polyadenylation site usage. We demonstrate that 3'-Seq and Iso-Seq are able to identify and quantify the usage of polyadenylation sites more reliably than computational tools that take short-read RNA-seq as input. However, we find that running one such tool, QAPA, with a set of polyadenylation site annotations derived from small quantities of 3'-Seq or Iso-Seq can reliably quantify variation in APA across conditions, such asacross genotypes, as demonstrated by the successful mapping of alternative polyadenylation quantitative trait loci (apaQTL).
Conclusions: We envisage that our analyses will shed light on the advantages of studying APA with more specialized sequencing protocols, such as 3'-Seq or Iso-Seq, and the limitations of studying APA with short-read RNA-seq. We provide a computational pipeline to aid in the identification of polyadenylation sites and quantification of polyadenylation site usages using Iso-Seq data as input.
Keywords: 3′-Seq; Alternative polyadenylation; Benchmarking; Isoform analysis; Long-read sequencing; PacBio Iso-Seq; QTL; RNA processing; RNA-seq.
© 2021. The Author(s).