Epub 2015 Nov 9.
High-throughput Sequencing of Human Plasma RNA by Using Thermostable Group II Intron Reverse Transcriptases
Free PMC article
Item in Clipboard
High-throughput Sequencing of Human Plasma RNA by Using Thermostable Group II Intron Reverse Transcriptases
2016 Jan .
Free PMC article
Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling.
diagnostics; next-generation sequencing; noncoding RNA; tRNA; transcriptome profiling.
© 2015 Qin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
TGIRT-seq overview. (
A) RNA-seq library construction via TGIRT template-switching. TGIRT template-switching reverse transcription reactions use an initial template–primer substrate comprised of an RNA oligonucleotide, which contains an Illumina Read 2 primer-binding site (R2 RNA) and has a 3′-blocking group, annealed to a complementary DNA primer (R2R DNA), which leaves an equimolar mixture of A, C, G, and T (denoted N) single-nucleotide 3′ overhangs. In the protocol used in the present work, the initial R2 RNA-R2R DNA substrate was mixed with target RNA and TGIRT enzyme in the reaction medium, with the enzyme added last, and then pre-incubated for 30 min at room temperature prior to initiating reverse transcription reactions by adding dNTPs. The reactions were incubated for 15 min at 60°C and terminated by alkaline treatment, as described in Materials and Methods. The cDNA products were then purified with a MinElute Reaction Cleanup Kit (QIAGEN) and ligated at their 3′ ends to a 5′-adenylated/3′-blocked DNA oligonucleotide complementary to an Illumina Read 1 primer (R1R) by using a Thermostable 5′ AppDNA/RNA Ligase (New England Biolabs). The ligated cDNAs were repurified and amplified by PCR for 12 cycles to add Illumina flow cell capture sites (P5 and P7) and barcode sequences for sequencing. ( B) Mapping pipeline for RNA-seq data sets constructed with TGIRT enzymes. After trimming adapter sequences and reads with low quality base calls by using cutadapt, reads of ≥18 nt were mapped by TopHat and Bowtie2 (default settings) to a human genome reference sequence (Ensembl GRCh38 release 76) supplemented with additional rRNA gene contigs and other sequences (Pass 1) (see Materials and Methods). Unmapped reads from Pass 1 were then remapped to the same human genome reference sequence using Bowtie2 local alignment (default settings) to recover reads from RNAs with post-transcriptionally added nucleotides [e.g., 3′ CCA, poly(U)] or short introns (e.g., tRNA introns; Pass 2). Concordant read pairs that mapped uniquely with MAPQ ≥15 from Passes 1 and 2 were combined and mapped to genomic features. Reads that mapped to tRNA genes were filtered and combined with the reads that remained unmapped after the Bowtie2 local alignment, and remapped to human tRNA reference sequences (UCSC genome browser website) to achieve optimal recovery and mapping of tRNA reads. tRNA reads with MAPQ ≥1 were combined with mapped genome reads from the prior steps for downstream analysis.
Bioanalyzer traces showing size profiles of plasma RNAs before and after various treatments. Total plasma RNA was prepared by the Direct-zol method, and a 1-µL portion was analyzed with an RNA 6000 Pico Kit (mRNA assay) on a 2100 Bioanalyzer (Agilent) to obtain the traces shown in the figure. (
A) Total plasma RNA with no further treatment (NT). ( B) Total plasma RNA after on-column DNase I treatment (OCD). ( C, D) Total plasma RNA after OCD treatment followed by RNase I or alkaline hydrolysis treatments, respectively.
Percentage of TGIRT-seq reads from total plasma RNA data sets mapping to different categories of genomic features. RNA-seq data sets were constructed by using TeI4c RT for total plasma RNA prepared by the Direct-zol method and either not treated (NT; combined DS1–3), 3′ dephosphorylated (–3′ P; combined DS4–6), or on-column DNase I-treated (OCD; combined DS7–10). Reads were mapped to genomic features as described in Materials and Methods. (
A) Stacked bar graphs showing the percentage of concordant read pairs that mapped uniquely in the correct orientation to the indicated category of genomic features. Protein-coding genes include immunoglobulin and T-cell receptor genes; long ncRNAs include lincRNAs, antisense RNAs and other lncRNAs; and rRNA genes include 5S, 5.8S, 18S, and 28S rRNA genes. ( B) Stacked bar graphs showing the percentage of small ncRNA read pairs (1.8%–5.8% of the reads in the total plasma RNA data sets) that mapped to different categories of small ncRNA genes. In A and B, the numbers next to each stacked bar segment indicate the number of different genes for which transcripts were identified in that category. Only features with ten or more mapped reads in the combined data sets were included. (MT) Mitochondrial genes.
Human plasma RNA is enriched in intron and antisense sequences compared with whole-cell RNAs. Reads mapping to protein-coding genes were analyzed to assess coverage across different regions and both DNA strands in RNA-seq data sets constructed with TGIRT enzymes for total plasma or whole-cell RNA prepared and treated in different ways. These include plasma RNA prepared by the Direct-zol method with no further treatment (NT; combined DS1–3), after on-column DNase I treatment (OCD; combined DS7–10), or after Baseline-ZERO DNase treatment (BZD; DS11); plasma RNA prepared by the mirVana combined method after Baseline-ZERO DNase treatment (M-BZD; DS16); and ribo-depleted and fragmented whole-cell RNA from Jurkat cells (TeI4c RT; DS18) or K562 cells (GsI-IIC RT; DS19). (
A) Stacked bar graphs showing the percentage of bases in protein-coding gene reads that mapped to coding sequences (CDS), introns, 5′- and 3′-untranslated regions (UTRs), and intergenic regions. ( B) Stacked bar graphs showing the proportion of concordant read pairs that mapped to the sense and antisense strands of protein-coding genes. In A and B, reads that mapped to protein-coding genes were filtered to remove those with >50% of the read length overlapping embedded small ncRNAs, and the percentage of bases or reads mapping to different regions or strands was calculated by using picard tools. Reads from the OCD, BZD, and M-BZD data sets were analyzed with or without removal of read pairs with a span of <30 nt to exclude short DNA fragments that may have escaped DNase treatment.
Human plasma contains both mature and pre-miRNAs. (
A) Relative abundance of miRNAs identified in RNA-seq data sets constructed with TeI4c RT for total plasma RNAs prepared by the Direct-zol method with on-column DNase I treatment (OCD; combined DS7–10; left) or by the mirVana combined method with Baseline-ZERO DNase treatment (M-BZD; DS16; right). miRNA loci with 10 or more mapped reads were rank-ordered by read count and plotted to display relative abundance. The 20 most abundant miRNAs loci by read count are shown in the bar graph insets. Loci encoding predicted miRNAs (Ensembl GRCh38 Release 76) were not included in the bar graphs unless mature-sized miRNAs mapping to the locus were identified in the data sets. ( B, C) IGV screen shots showing coverage plots (CP; above) and alignments (below) of reads for loci in which abundant miRNA transcripts were identified in the OCD and M-BZD data sets, respectively. In B, the miRNA transcripts were ordered based on abundance as shown in the left panel of A. ( C) IGV screen shots showing additional miRNA transcripts that were abundant in the M-BZD data set, but less abundant or not present in the OCD data sets. The arrow at the top indicates the boundaries and 5′–3′ orientation of the mature miRNA on the chromosomal DNA sequence. Reads were sorted by the start site on the chromosome, which can be from either the 5′ or 3′ end depending on the orientation of the gene on the chromosome. Nucleotides matching the genome sequence are shown in gray, and mismatches are shown as different colors (A, green; C, blue; G, brown; and T, red), which can either correspond to or be the complement of the RNA sequence depending on the orientation of the gene on the chromosome. Mismatches were checked against NCBI dbSNP, and known SNPs are indicated with the nucleotide change and corresponding SNP ID. Mismatches at the 5′ end of the reads are likely due to nontemplated nucleotide addition by the TGIRT enzyme to the 3′ end of the cDNAs. Some miRNAs (e.g., miR-122) have post-transcriptionally added A or AA residues at their 3′ ends (Norbury 2013).
Tissue expression profiles for mature miRNAs in plasma. The figure shows tissue expression profiles of the mature miRNAs identified by TGIRT-Seq in total plasma RNA prepared by the Direct-zol method with on-column DNase I treatment (OCD; combined DS7–10). The profiles are based on the relative RNA-seq expression values of the miRNAs in a published database (Landgraf et al. 2007), and only miRNAs present in that database are shown. Tissue categories: podocytes include both differentiated and undifferentiated podocytes; peripheral leukocytes include T-lymphocytes, NK cells, monocytes, granulocytes, and dendritic cells. miRNAs highlighted in red are also abundant (top 10 percentile) in red blood cells or platelets (Wang et al. 2012), cell types for which relative RNA-seq expression values were not available in the database used to calculate the expression profiles (Landgraf et al. 2007).
TGIRT-seq identifies full-length mature tRNAs and tRNA fragments in human plasma. (
A) Relative abundance of tRNAs identified in RNA-seq data sets constructed with TeI4c RT for total plasma RNA prepared by the Direct-zol method without (NT; combined DS1–3) or with treatment to remove 3′ phosphates (–3′ P; combined DS4–6). The plots show tRNAs with 10 or more mapped reads grouped by anticodon and rank-ordered by read count. The 15 most abundant tRNAs based on anticodon are shown in the bar graph insets. ( B) IGV screen shots showing coverage plots (CP; above) and alignments (below) of reads for abundant full-length mature tRNAs identified in the NT data sets. The tRNAs were ordered by abundance as in the left panel of A. For cases in which multiple loci encode tRNAs with the same sequence, tRNA reads were distributed equally among different tRNA loci for the IGV alignments. ( C) IGV screen shots showing coverage plots and alignments of reads for representative 3′-tRNA halves in the NT data sets (AlaAGC and ThrCGT) and 5′-tRNA halves in the –3′ P data sets (GlyCCC, ArgCCG, and AspGTC). The arrow at the top indicates the boundaries and 5′–3′ orientation of the mature tRNA on the chromosomal DNA sequence. In order to fit the entire alignment in one panel, genes with >1000 mapped reads were down-sampled to 1000 reads in IGV. Reads were sorted by start site on the chromosome. Nucleotides matching the genome sequence are shown in gray, and mismatches are shown as different colors (A, green; C, blue; G, brown; and T, red). Mismatches at the 5′ end of the reads are likely due to nontemplated nucleotide addition by the TGIRT enzyme to the 3′ end of the cDNAs. Mismatches due to misincorporation at known sites of post-transcriptional modifications are highlighted with the name of the modification. Modifications: I, inosine; m 1A, 1-methyladenosine; m 3C, 3-methylcytidine; m 5C, 5-methylcytidine; m 1G, 1-methylguanosine; m 2G, N 2-methylguanosine; m 2 2G, N 2,N 2-dimethylguanosine.
Other classes of small noncoding RNAs identified as full-length mature transcripts in human plasma by TGIRT-seq. (
A) IGV screen shots showing coverage plots (CP; above) and alignments ( below) of reads mapping to small ncRNAs loci in RNA-seq data sets constructed with TeI4c RT for total plasma RNA prepared by the Direct-zol method (NT; combined DS1–3). The RNA biotype is indicated at the top with the gene name and transcript length in parentheses. ( B) Examples of small ncRNA fragments with poly(U) tails. IGV screen shots showing coverage plots (CP; above) and alignments ( below) of Read 1s for poly(U)-tailed small ncRNAs found among the unmapped reads in NT data sets. In A and B, the arrow at the top indicates the boundaries and 5′ to 3′ orientation of the mature transcript on the chromosomal DNA sequence. In order to fit the entire alignment in one panel, genes with >1000 mapped reads were down-sampled to 1000 reads in IGV. Reads were sorted by start site on the chromosome, which can be from either the 5′ or 3′ end depending on the orientation of the gene on the chromosome. Nucleotides matching the genome sequence are shown in gray, and mismatches are shown as different colors (A, green; C, blue; G, brown; and T, red), which can either correspond to or be the complement of the RNA sequence. Mismatches were checked against NCBI dbSNP, and known SNPs are indicated with the nucleotide change and corresponding SNP ID. Other mismatches were manually checked and were due to lower quality base-calls, nontemplated nucleotide addition to the 3′ end of the cDNA resulting in extra nucleotides at the 5′ end of the read, or misalignment by Bowtie2 local alignment.
All figures (8)
RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase.
RNA. 2016 Apr;22(4):597-613. doi: 10.1261/rna.055558.115. Epub 2016 Jan 29.
26826130 Free PMC article.
Preparation of Single-Cell RNA-Seq Libraries for Next Generation Sequencing.
Curr Protoc Mol Biol. 2014 Jul 1;107:4.22.1-17. doi: 10.1002/0471142727.mb0422s107.
Curr Protoc Mol Biol. 2014.
24984854 Free PMC article.
Introduction to sequencing the brain transcriptome.
Int Rev Neurobiol. 2014;116:1-19. doi: 10.1016/B978-0-12-801105-8.00001-1.
Int Rev Neurobiol. 2014.
25172469 Free PMC article.
RNY4 in Circulating Exosomes of Patients With Pediatric Anaplastic Large Cell Lymphoma: An Active Player?
Front Oncol. 2020 Feb 27;10:238. doi: 10.3389/fonc.2020.00238. eCollection 2020.
Front Oncol. 2020.
32175280 Free PMC article.
Reducing the structure bias of RNA-Seq reveals a large number of non-annotated non-coding RNA.
Nucleic Acids Res. 2020 Mar 18;48(5):2271-2286. doi: 10.1093/nar/gkaa028.
Nucleic Acids Res. 2020.
31980822 Free PMC article.
Performance assessment of total RNA sequencing of human biofluids and extracellular vesicles.
Sci Rep. 2019 Nov 26;9(1):17574. doi: 10.1038/s41598-019-53892-x.
Sci Rep. 2019.
31772251 Free PMC article.
Template-switching mechanism of a group II intron-encoded reverse transcriptase and its implications for biological function and RNA-Seq.
J Biol Chem. 2019 Dec 20;294(51):19764-19784. doi: 10.1074/jbc.RA119.011337. Epub 2019 Nov 11.
J Biol Chem. 2019.
31712313 Free PMC article.
Distinct mechanisms of microRNA sorting into cancer cell-derived extracellular vesicle subtypes.
Elife. 2019 Aug 22;8:e47544. doi: 10.7554/eLife.47544.
31436530 Free PMC article.
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
High-Throughput Nucleotide Sequencing*
RNA-Directed DNA Polymerase / metabolism*
RNA-Directed DNA Polymerase