Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease

Hum Mol Genet. 2022 Aug 12;ddac196. doi: 10.1093/hmg/ddac196. Online ahead of print.


Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases, and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTL) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions, and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.

Keywords: GWASsQTLLong-read RNA-seqIsoformAlternative splicing.