2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Matthew T Parker; Katarzyna Knop; Geoffrey J Barton; Gordon G Simpson

doi:10.1186/s13059-021-02296-0

2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Genome Biol. 2021 Mar 1;22(1):72. doi: 10.1186/s13059-021-02296-0.

Authors

Matthew T Parker¹, Katarzyna Knop², Geoffrey J Barton², Gordon G Simpson^{3

4}

Affiliations

¹ School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK. m.t.parker@dundee.ac.uk.
² School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK.
³ School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK. g.g.simpson@dundee.ac.uk.
⁴ James Hutton Institute, Invergowrie, DD2 5DA, UK. g.g.simpson@dundee.ac.uk.

Abstract

Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools ( https://github.com/bartongroup/2passtools ), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.

Keywords: Gene expression; Long-read sequencing; Machine learning; Nanopore sequencing; RNA-seq; Spliced alignment; Splicing; Transcriptome assembly.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods*
Introns
Machine Learning*
Molecular Sequence Annotation
RNA Splice Sites*
RNA Splicing
RNA-Seq* / methods
Reproducibility of Results
Sequence Alignment / methods*
Software*

Substances

RNA Splice Sites

Abstract

Publication types

MeSH terms

Substances

Grants and funding