Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies

Haridha Shivram; Vishwanath R Iyer

doi:10.1261/rna.066217.118

Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies

RNA. 2018 Sep;24(9):1266-1274. doi: 10.1261/rna.066217.118. Epub 2018 Jun 27.

Authors

Haridha Shivram¹, Vishwanath R Iyer¹

Affiliation

¹ Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, USA.

Abstract

The quality of RNA sequencing data relies on specific priming by the primer used for reverse transcription (RT-primer). Nonspecific annealing of the RT-primer to the RNA template can generate reads with incorrect cDNA ends and can cause misinterpretation of data (RT mispriming). This kind of artifact in RNA-seq based technologies is underappreciated and currently no adequate tools exist to computationally remove them from published data sets. We show that mispriming can occur with as little as two bases of complementarity at the 3' end of the primer followed by intermittent regions of complementarity. We also provide a computational pipeline that identifies cDNA reads produced from RT mispriming, allowing users to filter them out from any aligned data set. Using this analysis pipeline, we identify thousands of mispriming events in a dozen published data sets from diverse technologies including short RNA-seq, total/mRNA-seq, HITS-CLIP, and GRO-seq. We further show how RT mispriming can lead to misinterpretation of data. In addition to providing a solution to computationally remove RT-misprimed reads, we also propose an experimental solution to completely avoid RT-mispriming by performing RNA-seq using thermostable group II intron derived reverse transcriptase (TGIRT-seq).

Keywords: EZH2; GRO-seq; HITS-CLIP; RNA sequencing (RNA-seq); RNA-binding; TGIRT; artifacts; mispriming; polycomb repressive complex (PRC2); reverse transcriptase; reverse transcription; short RNA-seq.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Artifacts
Cell Line, Tumor
Computational Biology / methods
High-Throughput Nucleotide Sequencing / methods
High-Throughput Nucleotide Sequencing / standards
Humans
RNA Probes / metabolism
Reverse Transcriptase Polymerase Chain Reaction / methods
Reverse Transcriptase Polymerase Chain Reaction / standards*
Reverse Transcription
Sequence Analysis, RNA / methods
Sequence Analysis, RNA / standards*

Substances

RNA Probes

Grants and funding

R21 CA198648/CA/NCI NIH HHS/United States