Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 19;11(1):2.
doi: 10.1186/s13073-019-0614-1.

Reconstruction of full-length circular RNAs enables isoform-level quantification

Affiliations

Reconstruction of full-length circular RNAs enables isoform-level quantification

Yi Zheng et al. Genome Med. .

Abstract

Currently, circRNA studies are shifting from the identification of circular transcripts to understanding their biological functions. However, such endeavors have been limited by large-scale determination of their full-length sequences and also by the inability of accurate quantification at the isoform level. Here, we propose a new feature, reverse overlap (RO), for circRNA detection, which outperforms back-splice junction (BSJ)-based methods in identifying low-abundance circRNAs. By combining RO and BSJ features, we present a novel approach for effective reconstruction of full-length circRNAs and isoform-level quantification from the transcriptome. We systematically compared the difference between the BSJ-level and isoform-level differential expression analyses using human liver tumor and normal tissues and highlight the necessity of deepening circRNA studies to the isoform-level resolution. The CIRI-full software can be accessed at https://sourceforge.net/projects/ciri .

Keywords: Alternative splicing; Circular RNA (circRNA); Isoform quantification; Transcript reconstruction.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Workflow of reverse overlap detection and full-length circular RNA reconstruction. RO, reverse overlap; BSJ, back-spliced junction; FSJ, forward-spliced junction. a RO is an overlapped region in amplified circular transcripts in which the 5′- or 3′- ends of paired reads are reversely overlapped with each other. The presence of a 5’ RO indicates that the paired reads are derived from a circular transcript. The presence of both 5′ and 3′ ROs indicates that a full-length circular transcript can be generated by merging the 5′ and 3′ overlapped sequences of the read. b Alignment of a read pair with 5′ RO and/or 3′ RO. c, d Candidate RO-merged reads are mapped to the reference genome to accurately determine the locations of the reads and to rule out contamination. The longest alignment is chosen as an anchor for determining the location of the reads (c). Unmapped and abnormally mapped fragments in the candidate RO-merged reads are realigned to the reference genome based on the location of the anchored alignment; the alignment boundaries are then adjusted based on the GT/AG splicing signal (d). e Workflow of full-length circRNA reconstruction. ROs, BSJs and cirexons are first detected from RNA-seq data. Full-length circRNAs can be reconstructed when both 5′ and 3′ RO are present or when the circRNAs are completely covered by BSJ reads. For circRNAs lacking 3′ RO or FSJs, a combined assembly is performed to integrate the 5′ RO reads and the BSJ reads. f Isoform-level quantification of circRNAs. The BSJ and RO-merged reads are aligned to the reference genome. A forward splice graph (FSG) that records the splicing and coverage information is built based on the alignments. Next, the resulting FSC is dissected into paths that represent putative circular isoforms of the circRNA (right panel). Paths that contain phasing FSJ, where the splicing event is exclusively occupied by only one circular isoform, or co-occurred FSJs, where the number of splicing events is supported by the same RO read, are classified as phased isoforms. The read coverage profile of each path is modeled by a Monte Carlo simulation (right middle panel). Expressed circular isoforms are dissected and quantified by employing an approximate exhaustive search algorithm (bottom)
Fig. 2
Fig. 2
Performance evaluation of the RO approach to circRNA identification. a, b Performance comparison between the RO approach and the BSJ-based tools. “RO only” represents the circRNAs that are only identified by the RO approach. a CircRNA detection rate on the four data sets with different circRNA depth. b CircRNA detection rate on the four data sets with different read length. c Component of circular RNA and circular RNA reads detect by RO in simulated data (5X, paired-end 200 bp). d Base depth distribution of BSJ reads (pink) and RO reads (green) on normalized circular RNAs. e, f Accuracy evaluation of the AS events-based and the FSG-based quantification algorithms (CIRI-AS vs. CIRI-full) using simulated circRNA-containing transcriptomic data sets, including different sequencing depth (e) and different read length (f). g Sensitivity evaluation of the FSG-based quantification algorithm on simulated circRNA-containing transcriptomic data sets, where each circRNA contains three isoforms with different abundance. The bar plot on the right top displays the number of isoforms detected in 994 circRNAs; the bar plot on the right bottom shows the accuracy of FSG quantification in three types of isoforms. Accuracy rate is defined as the percentage of isoforms that are fully reconstructed and of which the predicted relative abundance matches the ground truth (difference between them is smaller than 20%). h, i The accuracy distribution of FSG method on the three types of reconstructed isoforms. j Experimental validation of the FSG-based isoform quantification algorithm. X- and y-axis represent the relative abundance of circRNA isoforms determined by qPCR and the FSG-based algorithm, respectively. Each dot represents a circRNA isoform, and dots in the same color represents that they come from the same circRNA
Fig. 3
Fig. 3
Full-length circRNA reconstruction of HeLa cell line (a–c) and human brain (d–f) transcriptomes with RNase R + RiboMinus treatment. a, d CircRNAs reconstructed using both the RO and BSJ features. Completely reconstructed circRNAs are shown in blue-lined ovals. Nearly complete and partial circRNAs are shown in orange and gray, respectively. b, e Length distribution of reconstructed circRNAs in the HeLa cell line and in human whole brain tissue. Complete, nearly complete and partial circRNAs are shown in blue, orange, and gray, respectively. The length of partially reconstructed circRNAs was estimated based on supported BSJ/RO reads and sequencing depth in the RNase R-treated sample. c, f Expression levels of different categories of circRNA. gj Performance of the RO feature in circRNA identification when applied to fragmented or low-quality RNA samples. The green bar indicates the RNA data set derived from high-quality RNA (RIN = 10) without manual fragmentation. The yellow bar indicates the RNA data set derived from high-quality RNA (RIN = 10) with manual fragmentation. The red bar indicates the RNA data set derived from low-quality RNA (RIN = 5) with manual fragmentation. g, h Comparison of the ratio of RO reads to total circRNA reads for circRNAs of different lengths. ‘**’ and ‘*’ represent P < 0.01 and P < 0.05, respectively (Mann–Whitney U test). i, j Ratio of the number of circular RNAs with RO reads to total circular RNAs of a given length. k Comparison of full-length circRNA structure and corresponding annotated exon regions in human brain tissue. The 150 most highly expressed circRNAs that were completely reconstructed are shown. Each line represents a circRNA with normalized length
Fig. 4
Fig. 4
CircRNAs expression profiles in vertebrate brain tissues. a CircRNAs identified in six vertebrate brain tissues (RNase R + Ribomiuns treatment) by CIRI-full. The number of shared circRNAs is shown on the phylogenetic tree. The table on the right shows the RNA-seq data set size and the numbers of identified circRNAs, cirexons, intronic/intergenic circRNA fragments (ICFs), and full-length circRNAs. The histogram on the right shows the length distribution of reconstructed circRNAs; blue, orange, and gray represent complete, nearly complete and partial circRNAs, respectively. b Overlap of highly expressed mRNAs and circRNAs in closely related species. Obviously, mRNA expression is more conserved than circRNA expression in closely related species (human vs. macaque, mouse vs. rat). c Expression levels of circRNAs and their corresponding mRNA genes. a, b, c, and d represent the four ancestral nodes, as shown in panel a. Species-specific circRNAs in four species (shown in blue) have much lower expression levels than the shared circRNAs present in ancestral nodes. d Percentage of circRNAs (BSJ ≥ 10 reads) containing four types of alternative splicing events. e Expression profiles of circRNA isoforms in the six species. The relative abundance of circRNA isoforms were normalized between 0 and 1
Fig. 5
Fig. 5
Sequencing length affects circRNA identification but not quantification. Gray bars represent the circRNAs that could be identified from both datasets and red bars represent the circRNAs exclusively detected in the PE250 dataset. a The number of circRNAs detected by CIRI_full from the PE250 and PE100 datasets. b The length distribution of reconstructed circRNA isoforms. c The number of reconstructed isoforms with different expression levels. d The difference of relative expression levels of circRNA isoforms estimated from PE250 and PE100 datasets. Different circRNA expression levels are shown in different colors. Two vertical dashed lines represent the threshold of relative abundance difference between PE100 and PE250 (± 0.2). The ratios in the panel represent the percentage of accurately quantified circRNA isoforms
Fig. 6
Fig. 6
Differential circRNA isoform expression between normal and tumor liver tissues of 20 HCC patients. a A schematic comparison between BSJ-level and isoform-level differential expression analysis. b The number of circRNAs detected from normal and tumor liver tissues of 20 HCC patients. Bars in light color represent the numbers of circRNAs in a certain sample, and bars in dark color represent the numbers of highly expressed circRNAs (> 1 BSJ read per 10 million reads). c The number of circRNAs containing AS events that are detected by CIRI-AS. Light bars represent the circRNAs with one AS event and dark bars correspond to circRNAs containing more than one AS events. Black curved lines represent the total mapped data size for each sample. d Comparison of differential expression analysis between BSJ level (x-axis) and isoform level (y-axis). Each dot denotes a circRNA isoform, with its size representing the expression level and its color representing its relative abundance in the parental circRNA. e circRNAs in panel d can be classified into circRNAs with only one isoform and circRNAs with multiple isoforms. For circRNAs with multiple isoforms, Venn diagrams show the discrepancies of significantly up- or downregulated isoforms between BSJ level and isoform level differential expression analyses. f The average isoform expression fold change between normal and tumor tissues of the top 50 most highly expressed circRNAs that have multiple isoforms. Black dot represents the average fold change of circRNAs at the BSJ-level quantification. The dashed box highlights an example shown in panel g. g An example of alternative splicing switch between normal and tumor samples. CircRNA (chr2:207144264|207162097) locating on the ZDBF2 gene can express four isoforms. Rectangles with different colors represent the cirexons within this circRNA. The green and red histograms on each cirexon represent the normalized sequencing depth in normal and tumor samples, respectively. The curve connected cirexons represents the forward splice junction (FSJ) within this circRNA, and its width is proportional to the read support. The red curve represents the FSJ of the dominant circular isoform. The relative abundance of four circular isoforms is shown in bar plot (bottom). h Expression profiles of eight circRNAs quantified at BSJ and isoform level between normal (left) and tumor (right) tissues across 20 HCC patients. Red and cyan lines represent the expression profile of the major and minor circRNA isoform, respectively. All statistic significances are calculated by Mann–Whitney U test. “**” and “*” represent P < 0.01 and P < 0.05. “n.s.” indicates “not significant”
Fig. 7
Fig. 7
Time and memory usage of CIRI-full on human brain RNA-seq data sets. The CIRI-full pipeline consists two components, one is BSJ detection using CIRI2/CIRI-AS, the other is RO detection, and both are executed simultaneously. Height of boxes represents the running time of each module in this pipeline. Options “-t 5” was used for the last four datasets to activate the multithreading function of CIRI2 and BWA

Similar articles

Cited by

References

    1. Ashwal-Fluss R, Meyer M, Pamudurti NR, Ivanov A, Bartok O, Hanan M, Evantal N, Memczak S, Rajewsky N, Kadener S. circRNA biogenesis competes with pre-mRNA splicing. Mol Cell. 2014;56:55–66. doi: 10.1016/j.molcel.2014.08.019. - DOI - PubMed
    1. Guo JU, Agarwal V, Guo H, Bartel DP. Expanded identification and characterization of mammalian circular RNAs. Genome Biol. 2014;15:409. doi: 10.1186/s13059-014-0409-z. - DOI - PMC - PubMed
    1. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, Kjems J. Natural RNA circles function as efficient microRNA sponges. Nature. 2013;495:384–388. doi: 10.1038/nature11993. - DOI - PubMed
    1. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L, Mackowiak SD, Gregersen LH, Munschauer M, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495:333–338. doi: 10.1038/nature11928. - DOI - PubMed
    1. Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF, Sharpless NE. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013;19:141–157. doi: 10.1261/rna.035667.112. - DOI - PMC - PubMed

Publication types