Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 3:17:58.
doi: 10.1186/s12859-016-0922-z.

Measure transcript integrity using RNA-seq data

Affiliations

Measure transcript integrity using RNA-seq data

Liguo Wang et al. BMC Bioinformatics. .

Abstract

Background: Stored biological samples with pathology information and medical records are invaluable resources for translational medical research. However, RNAs extracted from the archived clinical tissues are often substantially degraded. RNA degradation distorts the RNA-seq read coverage in a gene-specific manner, and has profound influences on whole-genome gene expression profiling.

Result: We developed the transcript integrity number (TIN) to measure RNA degradation. When applied to 3 independent RNA-seq datasets, we demonstrated TIN is a reliable and sensitive measure of the RNA degradation at both transcript and sample level. Through comparing 10 prostate cancer clinical samples with lower RNA integrity to 10 samples with higher RNA quality, we demonstrated that calibrating gene expression counts with TIN scores could effectively neutralize RNA degradation effects by reducing false positives and recovering biologically meaningful pathways. When further evaluating the performance of TIN correction using spike-in transcripts in RNA-seq data generated from the Sequencing Quality Control consortium, we found TIN adjustment had better control of false positives and false negatives (sensitivity = 0.89, specificity = 0.91, accuracy = 0.90), as compared to gene expression analysis results without TIN correction (sensitivity = 0.98, specificity = 0.50, accuracy = 0.86).

Conclusion: TIN is a reliable measurement of RNA integrity and a valuable approach used to neutralize in vitro RNA degradation effect and improve differential gene expression analysis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Evaluating median TIN score (medTIN) metric using RIN and gene body read coverage. a Scatterplot showing correlation between the medTIN and the corresponding RIN score for 12 GBM samples. Black dashed line is the linear regression line fitted to data. b Scatterplot showing correlation between the medTIN and the corresponding RIN score for 120 mCRPC samples. Black dashed line is the linear regression line fitted to data. c Gene body coverage profiles for 12 GBM samples. Samples were ranked from top to bottom on the y-axis in the decreasing order of medTIN. Numbers in parentheses are the corresponding RIN scores. d Gene body coverage profiles for 120 mCRPC samples. Samples were ranked from top to bottom on the y-axis in the decreasing order of medTIN. r stands for Pearson’s correlation coefficient; ρ stands for Spearman’s correlation coefficient
Fig. 2
Fig. 2
Evaluating median TIN score (medTIN) and RIN metric using sample level average RNA fragment size. The average RNA fragment size of a sample was estimated from all read pairs that uniquely mapped to the reference genome (see Methods). a Correlation between RIN score and the average RNA fragment size for 12 GMB samples. b Correlation between medTIN and average RNA fragment size for 12 GMB samples. c Correlation between RIN score and average RNA fragment size for 120 mCRPC samples. d Correlation between medTIN and average RNA fragment size for 120 mCRPC samples. (c-d) Samples with RIN < 3 and RIN ≥3 were indicated as red and blue circles, respectively. (a-d) Linear regression lines fitted to data are indicated as black dashed lines
Fig. 3
Fig. 3
Evaluating TIN metric using transcript level RNA fragment size. The average RNA fragment size (y-axis) of a particular transcript was estimated from all the read pairs that uniquely mapped to the transcript (see Methods). a Correlation between TIN score and transcript level RNA fragment size. A single GBM sample (SRR873822; RIN = 10) was used to produce the figure. Each dot represents 50 transcripts. Red curve indicates the locally weighted polynomial regression curve. b Locally weighted polynomial regression curves for all GBM RNA-seq samples
Fig. 4
Fig. 4
Correlation between TIN score and transcript features including (a) transcript size, (b) CDS size, (c) 3′UTR size, (d) 5′UTR size and (e) GC content. The GBM dataset was used to make these comparisons. CDS stands for coding DNA sequence. UTR stands for un-translated region
Fig. 5
Fig. 5
Comparing Pearson correlation coefficient between gene expression (FPKM) and TIN score. a 20 mCRPC samples with 10 high RIN/TIN samples (RINmean = 7.1, RINsd = 1.6; red bars) and 10 low RIN/TIN samples (RINmean = 2.4, RINsd = 0.08; blue bars). b 6 GBM samples with 3 high RIN/TIN samples (red bars) and 3 low RIN/TIN samples (blue bars). FPKM stands for Fragments Per Kilobase of transcript per Million mapped reads
Fig. 6
Fig. 6
Evaluate the effect of TIN correction on gene expression. a Smoothed scatterplot showing TIN scores and raw read counts for a sample (GSM1722952) with good RNA quality with RIN = 6.7 and medTIN = 71.5 (before correction), (b) Smoothed scatterplot showing TIN scores and raw read counts for a sample (GSM1722948) with poor RNA quality with RIN = 2.6 and medTIN = 48.9 (before correction). c Smoothed scatterplot showing TIN scores and corrected read counts (using loess regression) for the sample with good RNA quality (after correction). d. Scatterplot showing TIN scores and corrected read counts (using loess regression) for the sample with poor RNA quality (after correction). Loess and linear regression trends were indicated as yellow (solid) and red (dashed) curves, respectively
Fig. 7
Fig. 7
Compare TIN correction to 3′ tag counting method (3TC). a-b Percentage of retained reads if 3′ 1 Kb, 0.5 Kb and 0.25 Kb were considered. c Reads coverage profiles for high RIN (blue) and low RIN mCRPC samples (red). All transcripts were aligned to the 3′ end (i.e transcription end site). d Venn diagram showing overlapping between DEGs detected by TIN correction and 3TC
Fig. 8
Fig. 8
Compare median TIN score (medTIN) with mRIN using 12 GBM RNA-seq data. a Concordance between medTIN and mRIN when measuring sample level RNA integrity. b-c Compare medTIN and mRIN to Agilent’s RIN. d-e Compare medTIN and mRIN to average RNA fragment size calculated from read pairs. r stands for Pearson’s correlation coefficient

Similar articles

Cited by

References

    1. von Ahlfen S, Missel A, Bendrat K, Schlumpberger M. Determinants of RNA quality from FFPE samples. PLoS One. 2007;2:e1261. doi: 10.1371/journal.pone.0001261. - DOI - PMC - PubMed
    1. Masuda N, Ohnishi T, Kawamoto S, Monden M, Okubo K. Analysis of chemical modification of RNA from formalin-fixed samples and optimization of molecular biology applications for such samples. Nucleic Acids Res. 1999;27:4436–4443. doi: 10.1093/nar/27.22.4436. - DOI - PMC - PubMed
    1. Botling J, Edlund K, Segersten U, Tahmasebpoor S, Engström M, Sundström M, et al. Impact of thawing on RNA integrity and gene expression analysis in fresh frozen tissue. Diagn Mol Pathol. 2009;18:44–52. doi: 10.1097/PDM.0b013e3181857e92. - DOI - PubMed
    1. Gallego Romero I, Pai AA, Tung J, Gilad Y. RNA-seq: impact of RNA degradation on transcript quantification. BMC Biol. 2014;12:42. doi: 10.1186/1741-7007-12-42. - DOI - PMC - PubMed
    1. Sigurgeirsson B, Emanuelsson O, Lundeberg J. Sequencing degraded RNA addressed by 3′ tag counting. PLoS One. 2014;9:e91851. doi: 10.1371/journal.pone.0091851. - DOI - PMC - PubMed

Publication types