Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 8;21(1):167.
doi: 10.1186/s13059-020-02071-7.

Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data

Affiliations

Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data

Ralph Patrick et al. Genome Biol. .

Abstract

High-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3 'UTR shortening in cardiac fibroblasts. Sierra is available at https://github.com/VCCRI/Sierra .

Keywords: Alternative polyadenylation; Differential transcript use; mRNA isoforms; scRNA-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Sierra workflow. Sierra starts with a BAM file produced by an alignment program such as CellRanger. Standard gene-level workflow (top row) involves using a gene model to produce a matrix of gene-level counts used for clustering. The Sierra pipeline performs splice-aware peak calling to identify coordinates corresponding to potential polyadenylation sites. Peak coordinates are used to build an annotated UMI count matrix for each gene peak. This new data can be used to identify genes showing differential peak usage, with visualisation options for plotting relative peak expression and read coverage across selected cell populations
Fig. 2
Fig. 2
Representative feature of Sierra data from a 7k cell PBMC dataset. a Counts of genes according to number of detected peaks. Dotted red line indicates median number of peaks. b Average composition of genomic feature types that peaks fall on, according to number of peaks per gene. c Percentage of cells expressing each genomic feature type with increasing stringency of cellular detection rates for peaks. d Number of genes expressing multiple (≥2) 3 UTR or exonic peaks with increasing stringency of cellular detection rates. e Comparison of PTPRC gene expression across cell populations on t-SNE coordinates with peaks identified as DU in monocytes. f, g Overlapping genes from a CD14 + monocyte vs CD4 + T cell comparisons for the PBMC 7k and PBMC 4k datasets for f DTU genes and g DE genes, visualised with [28]
Fig. 3
Fig. 3
Comparison of differential transcript usage between cardiac scRNA-seq and bulk RNA-seq populations. a t-SNE plot of the cardiac TIP cell lineages. b, c Gene expression visualised on t-SNE for bCxcl12 and cIgf1. d, e Relative peak expression visualised on t-SNE for example DU peaks between d sham fibroblasts and sham ECs, Cxcl12, and between e sham fibroblasts and MI leukocytes, Igf1. f, g Read coverage plots across the Cxcl12 and Igf1 genes for f single-cell and bulk fibroblast and EC populations (Cxcl12) and g single-cell and bulk sham fibroblast and MI leukocyte populations (Igf1). h Fisher’s exact tests on the number of overlapping DTU genes detected from scRNA-seq and bulk for different cell type/condition comparisons. Shown are the −log10 p values and the percentage of single-cell DTU genes overlapping the bulk. Red line indicates the significance (0.05) threshold. i Overlapping genes between single-cell and bulk RNA-seq from the sham fibroblast and MI leukocyte comparison. j Log fold-change comparisons for DU peaks identified in both the single-cell and bulk RNA-seq for the sham fibroblast vs EC analysis. Shown is the Spearman correlation coefficient
Fig. 4
Fig. 4
3 UTR shortening in activated and proliferating cardiac fibroblasts following MI. a, b UMAP visualisation of fibroblast populations from Pdgfra-GFP +/CD31 mouse cardiac cells at 3 days post-sham or MI surgery showing a an aggregate of all cells and b the UMAP plot separated according to condition. ce Counts of 3 UTR peaks showing differential usage according to their relative location to the terminating exon. Location of 0 indicates the peak most proximal to the terminating exon, with 1 representing the most distal. Comparisons performed are for c F-Cyc against F-SL and F-SH combined, d F-CI against F-SL and F-SH combined, and e F-Act against F-SL and F-SH combined. f, g Relative expression of peaks most distal and proximal (to terminating exon) for fTimp2 and gCd47 as visualised on UMAP coordinates. h, i Read coverage across 3 UTR for select single-cell fibroblast populations from sham (F-SL/F-SH combined) and MI (F-Act, F-CI, F-Cyc) datasets compared to bulk RNA-seq of FACS-sorted fibroblasts from sham and MI conditions for hTimp2 and iCd47
Fig. 5
Fig. 5
In vivo qRT-PCR validation of candidate genes with altered 3 UTR length in proliferating cardiac fibroblasts. a Diagram showing different anatomical locations of an MI heart: remote zone (RZ) and infarct zone (IZ). b Representative immunofluorescence images showing EdU + cells in sham or indicated anatomical location of MI hearts. Scale bar indicates 50 µm. c qRT-PCR expression of proliferation marker genes in Pdgfra-GFP + cells sorted from sham hearts, and RZ and IZ samples. Shown is the mean expression and standard error (n=3), with stars indicating significant difference between comparisons (1-tail t test; p<0.05). d Sierra candidate genes exhibiting a shift to proximal or distal peak usage from scRNA-seq population F-Cyc in comparison to F-SH/F-SL. Shown is the difference in proximal to distal peak fold-change (log2). e qRT-PCR comparison of proximal to distal (P–D) peak expression from candidate genes in Pdgfra-GFP +cells (Additional file 2: Figure S7) sorted from sham hearts, and RZ and IZ samples. Y-axis represents ΔΔ (P–D) expression (log2; see the ‘Methods’ section) of sample comparisons RZ vs sham, IZ vs RZ, and IZ vs sham. Shown is the mean expression difference and standard error (n=3), with stars indicating a significant difference for the comparison (1-tail t test; p<0.05)
Fig. 6
Fig. 6
Detecting differential transcript usage across the Tabula Muris dataset. a Comparison of the number of DU peaks across cell types within each tissue. Only cell types with more than 100 cells are included in the analysis. b, c Mammary gland tissue results. b Number of DU peaks between cell types. c Relative expression plot of DTU genes between a cell type and all remaining cell types in the tissue

Similar articles

Cited by

References

    1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40(12):1413–5. - PubMed
    1. Wang R, Zheng D, Yehia G, Tian B. A compendium of conserved cleavage and polyadenylation events in mammalian genes. Genome Res. 2018;28(10):1427–41. - PMC - PubMed
    1. Baralle FE, Giudice J. Alternative splicing as a regulator of development and tissue identity. Nat Rev Mol Cell Biol. 2017;18:437. - PMC - PubMed
    1. Tian B, Manley JL. Alternative polyadenylation of mrna precursors. Nat Rev Mol Cell Biol. 2016;18:18. - PMC - PubMed
    1. Wang ET, Ward AJ, Cherone J, Wang TT, Giudice J, Treacy D, Freese P, Lambert NJ, Saxena T, Cooper TA, Burge CB. Antagonistic regulation of mRNA expression and splicing by CELF and MBNL proteins. Genome Res. 2015. 10.1101/gr.184390.114. - PMC - PubMed

Publication types

LinkOut - more resources