Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct;17(5):522-534.
doi: 10.1016/j.gpb.2019.03.004. Epub 2020 Jan 31.

CircAST: Full-length Assembly and Quantification of Alternatively Spliced Isoforms in Circular RNAs

Affiliations

CircAST: Full-length Assembly and Quantification of Alternatively Spliced Isoforms in Circular RNAs

Jing Wu et al. Genomics Proteomics Bioinformatics. 2019 Oct.

Abstract

Circular RNAs (circRNAs), covalently closed continuous RNA loops, are generated from cognate linear RNAs through back splicing events, and alternative splicing events may generate different circRNA isoforms at the same locus. However, the challenges of reconstruction and quantification of alternatively spliced full-length circRNAs remain unresolved. On the basis of the internal structural characteristics of circRNAs, we developed CircAST, a tool to assemble alternatively spliced circRNA transcripts and estimate their expression by using multiple splice graphs. Simulation studies showed that CircAST correctly assembled the full sequences of circRNAs with a sensitivity of 85.63%-94.32% and a precision of 81.96%-87.55%. By assigning reads to specific isoforms, CircAST quantified the expression of circRNA isoforms with correlation coefficients of 0.85-0.99 between theoretical and estimated values. We evaluated CircAST on an in-house mouse testis RNA-seq dataset with RNase R treatment for enriching circRNAs and identified 380 circRNAs with full-length sequences different from those of their corresponding cognate linear RNAs. RT-PCR and Sanger sequencing analyses validated 32 out of 37 randomly selected isoforms, thus further indicating the good performance of CircAST, especially for isoforms with low abundance. We also applied CircAST to published experimental data and observed substantial diversity in circular transcripts across samples, thus suggesting that circRNA expression is highly regulated. CircAST can be accessed freely at https://github.com/xiaofengsong/CircAST.

Keywords: Circular RNA; Full-length reconstruction; Isoform quantification; Multiple splice graph model; Transcriptome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematics of CircAST for circular transcript assembly and quantification A. The flow diagram of CircAST. B. Visualization of the workflow of CircAST. CircAST begins with a set of paired-end RNA-seq reads that have been mapped to the genome. It then constructs multiple splice graphs with different BSJs in a gene locus and assembles the full-length sequences of circular transcripts with EMPC algorithm. CircAST estimates the abundance of each circular isoform assembled above by using an EM algorithm. Finally, all circular transcripts with full-length sequence assembly and abundance estimation are output in the results. SAM, sequence alignment/map; BSJ, back splice junction; EMPC, extended minimum path cover; EM, expectation maximization.
Figure 2
Figure 2
Circular transcripts assembled by CircAST on adult mouse testis data A. Venn diagram showing a comparison between the circular isoforms reconstructed by CircAST and CIRCexplorer2 from 2883 mouse testis back-spliced circRNAs (No. of supporting reads ≥10) (Top). Box plots of expression of circRNA isoforms from the Venn diagram, quantified by CircAST in FPKM (Bottom). B. Distribution of isoform number in each back splicing event. C. Comparison of the AS events of circular transcripts with cognate linear transcripts from the same gene locus. D. Visualization of assembled circular transcripts in the gene loci Ehbp1 (Chr11:22,053,432–22,068,506). E. Visualization of assembled circular transcripts in the gene loci Pphln1 (Chr15:93,424,014–93,465,245). For panels D and E, PCR primers are annotated together with the parental gene structure and cognate linear mRNA model. F. RT-PCR and Sanger sequencing for all the circular transcripts in the gene loci Ehbp1 (Chr11:22,053,432–22,068,506). G. RT-PCR and Sanger sequencing for all the circular transcripts in the gene loci Pphln1 (Chr15:93,424,014–93,465,245). For panels F and G, white arrows point to the isoform in the RNase R-treated sample with the correct product size, which was later subjected to Sanger sequencing. There are three isoforms for both circRNAs, in which circEhbp1-2-2 and circPphln1-1-1 were predicted by only CircAST but missed by CIRCexplorer2. The primer sequences are provided in Table S1. AS, alternative splicing.
Figure 3
Figure 3
Performance of CircAST on simulated datasets A. The sensitivity, precision, and F1 score of full-length assembly by CircAST (upper), and PCC and SCC (lower) between theoretical and estimated absolute expression levels across different circular transcripts assembled by CircAST in simulated datasets with different sequencing depths. B. The sensitivity, precision, and F1 score of full-length assembly by CircAST (upper), and Pearson and Spearman correlation coefficients (lower) between theoretical and estimated absolute expression levels across different circular transcripts assembled by CircAST in simulated datasets with different read lengths. C. PCC between the theoretical and estimated relative expression levels by CircAST for four back-spliced loci with two or more isoforms due to AS in simulated datasets. Correlation plots between the theoretical abundance and calculated read counts from junction reads or assigned reads by CircAST are shown for circPAFAH1B2-1-1 and circPAFAH1B2-1-2. D. Schematics of RI of equivalence classes to indicate the complexity of circRNAs due to the presence of overlapping regions. E. RI distribution of equivalence classes and effects of RI on the accuracy of circular transcript assembly and abundance estimation by CircAST. F. Comparison of the sensitivity, precision, and F1 score between CircAST and CIRCexplorer2 on simulated datasets with different read lengths and sequencing depths. PCC, Pearson correlation coefficient; SCC, Spearman correlation coefficient; RI, relation index.
Figure 4
Figure 4
Analysis of circular transcripts assembled by CircAST in three human cell lines, mouse testis, and chicken muscle A. Venn diagram depicting the diversity of circular transcripts assembled by CircAST among the three human cell lines HeLa, HEK293, and Hs68. B. The distribution of exon number in circular transcripts in different cell lines and tissues. C. Length distribution of circular transcripts in different cell lines and tissues. D. Distribution of circular isoform number in each back splicing event, and the percentage of transcript isoforms containing all known exons between back splice sites in different cell lines and tissues. E. The abundance ratio distribution of the top two most abundant circular isoforms from circRNAs with multiple isoforms in each back splice site. F. For the most abundant isoform in each circRNA with multiple isoforms, there is a large proportion (28%–48%) of exon skipping events.
Supplementary Figure S1
Supplementary Figure S1
Schematics of read mapping and fragment length in circRNAs A. CircAST uses the fragments located not only between the junction sites (blue) but also across the junction (red) in circular transcript assembly and quantification. B. Alignments of reads to the genome may be consistent with multiple circular transcripts. The implied length of the fragment f, denoted as lt(f), may be different because of the prevalence of AS in circular transcripts. The circular transcripts t1 and t1 differ by an internal exon b, with the implied length lt1(f) > lt2(f). C. If the length of a circRNA is shorter than the fragment length, the two reads from the fragment termini could possibly be aligned to the same region in circRNA. a, b, c represent 3 different exons and the site colored in black represents a BSJ.
Supplementary Figure S2
Supplementary Figure S2
Performance of CircAST on real mouse testis data with different circRNA sizes, sequencing depths, read lengths, and library sizes A. Reconstruction efficiency of CircAST with different circRNA sizes. B. Datasets on adult mouse testis with different sequencing depths, read lengths, and library sizes. Reconstruction efficiency of CircAST with different sequencing depths (C), read lengths (D), and library sizes (E).
Supplementary Figure S3
Supplementary Figure S3
Performance of CircAST on different library sizes ranging from 500 bp to 200 bp
Supplementary Figure S4
Supplementary Figure S4
Validation of all circRNA isoforms at different gene loci detected by CircAST Schematic models of all circRNA isoforms of Csnk1d (Chr11:120,967,995–120,973,969) (A), AW554918 (Chr18:25,339,714–25,420,075) (B), Stau2 (Chr1:16,440,323–16,509,408) (C), and Dcaf8 (Chr1:172,173,943–172,187,460) (D) are provided separately (top rows on the left). The RT-PCR results (on the right) for each circRNA isoform and the corresponding Sanger sequencing data showing correct specific splicing sites (bottom rows on the left) are provided for validation. All circRNA isoforms in the four gene loci were validated successfully except circDcaf8-1-2.
Supplementary Figure S5
Supplementary Figure S5
Validation of the circRNA isoforms at different gene loci detected by CircAST but missed by CIRCexplorer2 Schematic models of circRNA isoforms of Cep350 (Chr1:155,953,154–155,962,560) (A), Eya3 (Chr4:132,656,693–132,673,032) (B), and Crem (Chr18:3,287,904–3,327,591) (C) are provided separately (top rows on the left). The RT-PCR results (on the right) for each circRNA isoform and the corresponding Sanger sequencing data showing correct specific splicing sites (bottom rows on the left) are provided for validation. All three isoforms were validated successfully. D.Gapdh was used as a negative control.
Supplementary Figure S6-1
Supplementary Figure S6-1
Validation of the long circRNA isoforms in different gene loci predicted by CircAST The schematic models of circRNA isoforms of Mprip (Chr11: 59,741,120–59,771,702) (A), Fam13b (Chr18: 34,443,901–34,465,220) (B), Agbl2 (Chr2: 90,791,460–90,813,352) (C), Sbno1 (Chr5: 124,384,467–124,414,527) (D), Bptf (Chr11: 107,043,633–107,077,798) (E), Helz (Chr11: 107,592,706–107,649,340) (F), March6 (Chr15: 31,478,295–31,509,822) (G), Zfp638 (Chr6: 83,934,949–83,972,283) (H), and Ascc3 (Chr10: 50,690,115–50,767,507) (I) are provided separately (top rows). The RT-PCR results for each circRNA isoform and the corresponding Sanger sequencing data showing correct specific splicing sites (bottom rows) are provided for validation. J.Gapdh was used as a negative control.
Supplementary Figure S6-2
Supplementary Figure S6-2
Supplementary Figure S6-3
Supplementary Figure S6-3
Supplementary Figure S7
Supplementary Figure S7
Validation of the circRNA isoforms in different gene loci predicted by CIRCexplorer2 but missed by CircAST The schematic models of circRNA isoforms of Drc7 (Chr8:95,061,713–95,062,395) (A), Uggt2 (Chr14:118,994,946–119,002,986) (B), Agtpbp1 (Chr13:59,473,688–59,482,604) (C), Adam3 (Chr8:24,719,415–24,725,361) (D), Lin54 (Chr5:100,475,689–100,485,855) (E), Usp32 (Chr11:85,017,630–85,022,905) (F), Mllt10 (Chr2:18,101,458–18,126,229) (G), and Scaper (Chr9:55,912,040–55,921,338) (H) are provided separately (on the left). The RT-PCR results for each circRNA isoform showing incorrect product size are provided separately (on the right). These eight circRNA isoforms failed validation.
Supplementary Figure S8
Supplementary Figure S8
Quantitative validation of expression of circRNA transcripts calculated by CircAST Real-time PCR analysis of the expression of circRNAs in testes from 1-week-old and 3-week-old mice was performed with circCcar or circGcl as internal controls for normalization. FPKM values of the corresponding circRNA transcripts determined by CircAST in RNA-seq data are also given. Expression levels of circAsb3-1-1 (A), circRreb1-1-1(B), circGtsf1-4-1(C), circPi4ka-2-1(D), circHnrnpll-1-1(E), circKmt2c-2-1(F), circMap2k1-1-1(G), and circBbs9-4-2(H) were normalized to those from testes samples of 1-week-old mice. Comparison of circRNA expression between testes samples of 1W and 3W mice was perform by two-tailed independent Student's t-test. *, P < 0.05; **, P < 0.01.
Supplementary Figure S9
Supplementary Figure S9
Performance of CircAST on different lengths of sequencing reads trimmed from PE250 data (SRR7350933) used in CIRI-full A. CircAST with PE100 reconstructed 61.42% circRNAs of CIRI-full results, and its performance got lower as the sequencing reads became longer; B. CircAST missed mostly intronic or intergenic circular isoforms, and CIRI-full missed mostly longer exonic circular isoforms with length ≥ 600 bp. These two methods have different advantages in identifying various types of circular isoforms.

Similar articles

Cited by

References

    1. Capel B., Swain A., Nicolis S., Hacker A., Walter M., Koopman P. Circular transcripts of the testis-determining gene Sry in adult mouse testis. Cell. 1993;73:1019–1030. - PubMed
    1. Nigro J.M., Cho K.R., Fearon E.R., Kern S.E., Ruppert J.M., Oliner J.D. Scrambled exons. Cell. 1991;64:607–613. - PubMed
    1. Cocquerelle C., Mascrez B., Hétuin D., Bailleul B. Mis-splicing yields circular RNA molecules. FASEB J. 1993;7:155–160. - PubMed
    1. Jeck W.R., Sorrentino J.A., Wang K., Slevin M.K., Burd C.E., Liu J. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013;19:141–157. - PMC - PubMed
    1. Piwecka M., Glažar P., Hernandez-Miranda L.R., Memczak S., Wolf S.A., Rybak-Wolf A. Loss of a mammalian circular RNA locus causes miRNA deregulation and affects brain function. Science. 2017;357:eaam8526. - PubMed

Publication types