Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 14;9(3):e91851.
doi: 10.1371/journal.pone.0091851. eCollection 2014.

Sequencing degraded RNA addressed by 3' tag counting

Affiliations

Sequencing degraded RNA addressed by 3' tag counting

Benjamín Sigurgeirsson et al. PLoS One. .

Abstract

RNA sequencing has become widely used in gene expression profiling experiments. Prior to any RNA sequencing experiment the quality of the RNA must be measured to assess whether or not it can be used for further downstream analysis. The RNA integrity number (RIN) is a scale used to measure the quality of RNA that runs from 1 (completely degraded) to 10 (intact). Ideally, samples with high RIN (> 8) are used in RNA sequencing experiments. RNA, however, is a fragile molecule which is susceptible to degradation and obtaining high quality RNA is often hard, or even impossible when extracting RNA from certain clinical tissues. Thus, occasionally, working with low quality RNA is the only option the researcher has. Here we investigate the effects of RIN on RNA sequencing and suggest a computational method to handle data from samples with low quality RNA which also enables reanalysis of published datasets. Using RNA from a human cell line we generated and sequenced samples with varying RINs and illustrate what effect the RIN has on the basic procedure of RNA sequencing; both quality aspects and differential expression. We show that the RIN has systematic effects on gene coverage, false positives in differential expression and the quantification of duplicate reads. We introduce 3' tag counting (3TC) as a computational approach to reliably estimate differential expression for samples with low RIN. We show that using the 3TC method in differential expression analysis significantly reduces false positives when comparing samples with different RIN, while retaining reasonable sensitivity.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic drawing of the 3TC method.
Shown is a graphical representation of the annotation file used for counting. The original annotation file contains all annotated exons and isoforms for all genes and is shown on the left. The isoforms with the highest expression within a gene are indicated by a black borderline around its exons. The two steps of the 3TC process are shown in the middle; only those isoforms that have the highest expression are kept in the annotation file (isoform filtering) and all isoforms are truncated to a specific length N (length restriction, shown by shading). The final annotation file is shown on the right.
Figure 2
Figure 2. Effects of degradation on the RNA size distribution for different RINs.
The large ribosomal RNA peaks, 18S and 28S, show a steady decrease with decreasing RIN and completely disappear for RIN 2. Also apparent is an increase of small molecules with decreasing RIN. This is especially noticable for the RIN 2 sample. The dotted inset shows how the magnesium ions affect the RIN of the RNA as a function of temperature and incubation time.
Figure 3
Figure 3. Attributes of the experimental groups.
The height of the bars represents the average RIN for each group along with error bars. The bottom of each bar shows the number of samples in each group. Below each bar are the group names which both show the average RIN as well as the enrichment method used for the samples in the groups. Error bars denote the standard error. One of the RIN 8 samples failed in library preparation thus for the subsequent sequencing data there are only two samples in the RIN 8 group.
Figure 4
Figure 4. Preprocessing of sequencing data.
The barplot shows how many reads survive through each of the steps of the preprocessing pipline (see Methods). The step of going from Mapped reads to Useable reads is removal of rRNA. A large amount of reads are lost due to rRNA read removal in the RiboMinus group. The pecentage of useable reads (shown above the dotted lines) shows a steady decline with decreasing RIN. The poor performance of the RiboMinus samples can be attributed to high rRNA contamination.
Figure 5
Figure 5. Gene body coverage on average for each group.
Both RIN 10 and RiboMinus show even coverage. The percentages in the paranthesis show the relative amount of reads that map closer to the 3' end than to the 5' end, i.e. the amount of reads that map to the right of the dashed vertical line. Each step of decreasing RIN shows an increase in 3' bias.
Figure 6
Figure 6. Differential expression of degraded RNA.
(a) A common feature of the differential expression profiles is that long transcripts tend to be more highly expressed in the group with higher RIN and, reversely, short transcripts tend to be more highly expressed in the group with lower RIN. Shown here is the expression profile for the comparison RIN 10 vs. RIN 8, with logformula image of the fold change (fold change  =  expr(RIN 8)/expr(RIN 10)) on the y-axis and transcript length on the x-axis. (b) The DEGs shown in (a) are split into two groups; the ones that have higher expression in RIN 10 (red) and the ones that have higher expression in RIN 8 (blue). The average transcript length in the RIN 10 group is significantly higher than the average transcript length in the RIN 8 group (Student's t-test, pformula image0.001). Error bars denote the standard error. The distribution of these gene lengths is shown in Figure S6. (c) Expression profile of the comparison RIN 10 vs. RiboMinus. In total there are 3778 DEGs; with 2081 upregulated in the RM group and 1697 upregulated in the RIN 10 group. Some of the genes upregulated in the RM group show markedly high fold change. Many of those, marked with a circle, are histone genes. The transcripts of histone genes lack a poly A tail which explains why they show a markedly higher expression in the samples prepared with ribosomal depletion compared to samples prepared with poly A selection. Additionally, genes that show similar trend have been marked with a triangle. These data indicate that those genes may lack or have repressed poly adenylation sites.
Figure 7
Figure 7. Overlap of DEGs between the first three comparisons from Table 1.
Majority of the DEGs found in the comparison RIN 10 vs. RIN 8 are also found in the other two comparisons. While there is not an increase in DEGs when comparing RIN 10 to RIN 6 and RIN 4 the overlap between those two comparisons are considerable.
Figure 8
Figure 8. Effects of 3TC method on differential expression.
The y-axis shows the percentage of DEGs and the x-axis shows the sensitivity. The colors denote different comparisons while different shapes of points denote the varying N used in the length restriction process. (a) All RIN 10 comparisons. All comparisons demonstrate a sharp decrease in DEGs going from no length restriction to N = 1500 nt. Lowering N further results in fewer DEGs but at the expense of sensitivity. The control, which compares two different cell lines, does not show any abrupt decrease in DEGs but rather follows a straight line. (b) All non-RIN10 comparisons. All comparisons, except RIN 4 vs. RIN 2, show improvement with the 3TC (N = 1500) method. This is even true for the two comparisons, RIN 8 vs. RIN 6 and RIN 6 vs. RIN 4, where originally there were very few DEGs. [DEGs  =  differentially expressed genes].
Figure 9
Figure 9. Gene body coverage for transcripts longer than 5000 nt.
For transcripts longer than 5000Figure 5. These biases may explain why the 3TC method decreases the number of false positives found in the RIN 10 vs RM comparison as shown in Figure 8a.

Similar articles

Cited by

References

    1. Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, et al... (2013) Comparative analysis of rna sequencing methods for degraded or low-input samples. Nature Methods advance online publication: -. - PMC - PubMed
    1. Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, et al. (2006) The rin: an rna integrity number for assigning integrity values to rna measurements. BMC Molecular Biology 7: 3. - PMC - PubMed
    1. Imbeaud S, Graudens E, Boulanger V, Barlet X, Zaborski P, et al. (2005) Towards standardization of rna quality assessment using user-independent classifiers of microcapillary electrophoresis traces. Nucleic Acids Research 33: e56. - PMC - PubMed
    1. Opitz L, Salinas-Riester G, Grade M, Jung K, Jo P, et al. (2010) Impact of rna degradation on gene expression profiling. BMC Medical Genomics 3: 36. - PMC - PubMed
    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Meth 5: 621–628. - PubMed

Publication types

Grants and funding

This work was supported by the Swedish Research Council, the Knut and Alice Wallenberg Foundation, and Science for Life Laboratories, National Genomics Infrastructure (NGI), Sweden. The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.