Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 1;36(7):2033-2039.
doi: 10.1093/bioinformatics/btz900.

Episo: Quantitative Estimation of RNA 5-methylcytosine at Isoform Level by High-Throughput Sequencing of RNA Treated With Bisulfite

Free PMC article

Episo: Quantitative Estimation of RNA 5-methylcytosine at Isoform Level by High-Throughput Sequencing of RNA Treated With Bisulfite

Junfeng Liu et al. Bioinformatics. .
Free PMC article


Motivation: RNA 5-methylcytosine (m5C) is a type of post-transcriptional modification that may be involved in numerous biological processes and tumorigenesis. RNA m5C can be profiled at single-nucleotide resolution by high-throughput sequencing of RNA treated with bisulfite (RNA-BisSeq). However, the exploration of transcriptome-wide profile and potential function of m5C in splicing remains to be elucidated due to lack of isoform level m5C quantification tool.

Results: We developed a computational package to quantify Epitranscriptomal RNA m5C at the transcript isoform level (named Episo). Episo consists of three tools: mapper, quant and Bisulfitefq, for mapping, quantifying and simulating RNA-BisSeq data, respectively. The high accuracy of Episo was validated using an improved m5C-specific methylated RNA immunoprecipitation (meRIP) protocol, as well as a set of in silico experiments. By applying Episo to public human and mouse RNA-BisSeq data, we found that the RNA m5C is not evenly distributed among the transcript isoforms, implying the m5C may subject to be regulated at isoform level.

Availability and implementation: Episo is released under the GNU GPLv3+ license. The resource code Episo is freely accessible from (with Tophat/cufflink) and (with Kallisto).

Supplementary information: Supplementary data are available at Bioinformatics online.


Fig. 1.
Fig. 1.
The Episo pipeline. (A) The mapping procedure. Incoming RNA-BisSeq reads are mapped to reference genome and transcriptome. The output methylation file contains two columns that represent mapped fragments and methylation pattern. The symbols Z, X and H represent cytosines in CpG, CHG and CHH, respectively, whereas H can be A, C or T. The upper- and lowercase letters represent methylated and unmethylated cytosines, respectively. (B) The quantification procedure. For any given cytosine site, the total reads that cover the site and the reads that carry methylated cytosine at the given cytosine site are denoted as R and R′, respectively
Fig. 2.
Fig. 2.
The accuracy of Episo and the distribution of estimated m5C levels in human cells and mouse tissues. (A) The distribution of estimation errors of Episo. The difference of m5C levels between the simulated and Episo estimated data is shown at the resolution of RNA isoforms and single cytosine. At both resolutions, the comparisons were made at three m5C levels that covered the main range of estimations in real data from human cells and mouse tissues as shown in (B) and (C). (B) The distribution of estimated m5C levels at the resolution of RNA isoforms. In the mouse tissues tested, the m5C level is significantly higher in brain than that in the other three tissues. The ‘***’ indicates significant difference of P-value < 0.001. (C) The distribution of estimated m5C levels at the resolution of single cytosine. For both metrics, we found no significant different among mouse tissues tested. (D) Experiment design for PIK3R2. The unique exons for isoform PIK3R2-002 and PIK3R2-004 are marked in pink shadow and the primers design are indicated as red arrows. The unique methylated cytosine is marked with red flag. See Supplementary Figure S3 for PRKCA and TUBGCP2. (E) The comparison of the predicted MR ratio with the observed MR ratio. The hollow points and horizontal whiskers represent the expectation and 95% confident interval of predicted MR rate, respectively. The Y coordinates for the hollow points are the mean of experimental data. The solid points and vertical boxplots represents the experimental measured MR rates and their distributions of six replicates (three independent experiments with two replicates each), respectively. The diagonal line is Y = X. (Color version of this figure is available at Bioinformatics online.)
Fig. 3.
Fig. 3.
Diversity of m5C levels. (A) The distribution of expression and m5C levels between RNA isoforms of the M6PR gene in HeLa cells. The relative FPKMs were calculated as the absolute FPKMs of isoforms divided by the average FPKM of the isoforms of M6PR. (B) The proportion of m5C-containing and m5C-variable genes. The blue bars represent the proportion of genes that have at least one m5C- containing mRNA over all protein coding genes in the human HeLa cells, four mouse tissues and simulated data (sim). The red bars represent the proportion of genes with diversity of m5C CV > 1 over that of all genes represented by blue bars. (C) The proportion of diversity singleton sites (CV > 1) in the genomic context of CG, CHG and CHH. (D) The distribution of diversity of m5C at isoform and single-cytosine resolution. The orange curve represents the expected distribution of CV as obtained by random shuffling of real data. (Color version of this figure is available at Bioinformatics online.)

Similar articles

See all similar articles


    1. Agris P.F. (2015) The importance of being modified: an unrealized code to RNA structure and function. RNA, 21, 552–554. - PMC - PubMed
    1. Amort T. et al. (2017) Distinct 5-methylcytosine profiles in poly(A) RNA from mouse embryonic stem cells and brain. Genome Biol., 18, 1. - PMC - PubMed
    1. Bailey T.L. et al. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res., 37, W202–W208. - PMC - PubMed
    1. Black D.L. (2003) Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem., 72, 291–336. - PubMed
    1. Blanco S. et al. (2014) Aberrant methylation of tRNAs links cellular stress to neuro-developmental disorders. EMBO J., 33, 2020–2039. - PMC - PubMed