Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug;30(8):1191-1200.
doi: 10.1101/gr.260174.119. Epub 2020 Aug 17.

RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes

Affiliations
Free PMC article

RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes

Ka Ming Nip et al. Genome Res. 2020 Aug.
Free PMC article

Abstract

Despite the rapid advance in single-cell RNA sequencing (scRNA-seq) technologies within the last decade, single-cell transcriptome analysis workflows have primarily used gene expression data while isoform sequence analysis at the single-cell level still remains fairly limited. Detection and discovery of isoforms in single cells is difficult because of the inherent technical shortcomings of scRNA-seq data, and existing transcriptome assembly methods are mainly designed for bulk RNA samples. To address this challenge, we developed RNA-Bloom, an assembly algorithm that leverages the rich information content aggregated from multiple single-cell transcriptomes to reconstruct cell-specific isoforms. Assembly with RNA-Bloom can be either reference-guided or reference-free, thus enabling unbiased discovery of novel isoforms or foreign transcripts. We compared both assembly strategies of RNA-Bloom against five state-of-the-art reference-free and reference-based transcriptome assembly methods. In our benchmarks on a simulated 384-cell data set, reference-free RNA-Bloom reconstructed 37.9%-38.3% more isoforms than the best reference-free assembler, whereas reference-guided RNA-Bloom reconstructed 4.1%-11.6% more isoforms than reference-based assemblers. When applied to a real 3840-cell data set consisting of more than 4 billion reads, RNA-Bloom reconstructed 9.7%-25.0% more isoforms than the best competing reference-based and reference-free approaches evaluated. We expect RNA-Bloom to boost the utility of scRNA-seq data beyond gene expression analysis, expanding what is informatically accessible now.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Assembly quality on mouse simulated bulk RNA-seq data. (A) True positive rate calculated based on I95, denoted as TPR95. (B) True positive rate at four transcript expression strata. Isoforms in strata 1, 2, 3, and 4 have nonzero values of transcripts per million (TPM) in the lowest quartile, second-lowest quartile, second-highest quartile, and the highest quartile, respectively. (C) False-discovery rate calculated based on I95, denoted as FDR95. (D) Misassembly rate calculated based on I95, denoted as MR95.
Figure 2.
Figure 2.
Assembly quality on mouse simulated and real bulk RNA-seq data. Assembly sensitivity was measured as the number of isoforms reconstructed to at least 95% annotated isoform length (denoted as I95) in each data set normalized by the total number of isoforms in reference annotation. Misassembly rate was calculated based on I95, denoted as MR95.
Figure 3.
Figure 3.
Assembly quality on simulated data for 384 mouse cells. (A) True positive rate calculated based on isoforms reconstructed to at least 95% of annotated length, denoted as TPR95. (B) True positive rate at different transcript expression stratum. (C) False-discovery rate calculated based on isoforms reconstructed to at least 95% of annotated length, denoted as FDR95. (D) Misassembly rate calculated based on isoforms reconstructed to at least 95% of annotated length, denoted as MR95. Distributions of each metric were measured over all 384 cells. The comparison bars on top between RB(pool) and RB(ref,pool) and the next best performer in each class indicate statistical significance of the difference between distributions at P < 0.001 (***) or no significance (NS) using the Wilcoxon test.
Figure 4.
Figure 4.
Cell precision and true positive rate of RNA-Bloom's reference-free and reference-guided modes over 384 cells. The simulated data set of 384 cells is split into four smaller subpools of 96 cells and 1 cell. Pool size of 1 refers to no pooling between cells. Each subpool is assembled separately, and the cell precision of the assembled isoforms in each cell is calculated based on the I95. True positive rate was calculated based on I95, denoted as TPR95. Distributions of cell precision and TPR95 were measured over all 384 cells. The comparison bars on top between different pool sizes indicate statistical significance of the difference between distributions at P < 0.001 (***), P < 0.01 (**), P < 0.05 (*), or no significance (NS) using the Wilcoxon test.
Figure 5.
Figure 5.
Assembly sensitivity on experimental data of 96 mouse embryonic stem cells with ERCC and SIRV spiked-in transcripts. (A) Number of spiked-in transcripts reconstructed to at least 50% annotated length, denoted as S50. (B) Number of spiked-in transcripts reconstructed to at least 95% annotated length, denoted as S95. Distributions of each metric were measured over all 96 cells. RB(ref) and RB(ref,pool) assemblies were guided by the mouse transcriptome reference. The comparison bars on top indicate statistical significance of the difference between distributions at P < 0.001 (***), P < 0.01 (**), P < 0.05 (*), or no significance (NS) using the Wilcoxon test.
Figure 6.
Figure 6.
Assembly quality evaluation on an experimental scRNA-seq data set consisting of 3840 mouse microglia cells. (A) Number of isoforms reconstructed to at least 50% of annotated length, denoted as I50. (B) Number of isoforms reconstructed to at least 95% of annotated length, denoted as I95. (C) Misassembly rate calculated based on I50, denoted as MR50. (D) Misassembly rate calculated based on I95, denoted as MR95. Distributions of each metric were measured over all 3840 cells.
Figure 7.
Figure 7.
Clustering of microglial cells based on isoform reconstruction. The first row indicates 10 cell clusters, and colors in subsequent rows encode three levels of isoform reconstruction: none (below 50%), partial (at least 50% but below 95%), and full-length (at least 95%). The labels refer to isoforms of microglial genes that have either partial or full-length reconstruction in at least 38 cells.
Figure 8.
Figure 8.
Clustering of microglial cells based on isoform reconstruction of Trem2. Colors encode three levels of isoform reconstruction: none (below 50%), partial (at least 50% but below 95%), and full-length (at least 95%). The labels refer to Trem2 isoforms with either partial or full-length reconstruction in at least 38 cells.
Figure 9.
Figure 9.
Pooled assembly of scRNA-seq data in RNA-Bloom illustrated for data from two cells.

Similar articles

Cited by

References

    1. Arzalluz-Luque Á, Conesa A. 2018. Single-cell RNAseq for the study of isoforms—how is that possible? Genome Biol 19: 110 10.1186/s13059-018-1496-z - DOI - PMC - PubMed
    1. Birol I, Chu J, Mohamadi H, Jackman SD, Raghavan K, Vandervalk BP, Raymond A, Warren RL. 2015. Spaced seed data structures for de novo assembly. Int J Genomics Proteomics 2015: 196591 10.1155/2015/196591 - DOI - PMC - PubMed
    1. Bloom BH. 1970. Space/time trade-offs in hash coding with allowable errors. Commun ACM 13: 422–426. 10.1145/362686.362692 - DOI
    1. Bonham LW, Sirkis DW, Yokoyama JS. 2019. The transcriptional landscape of microglial genes in aging and neurodegenerative disease. Front Immunol 10: 1170 10.3389/fimmu.2019.01170 - DOI - PMC - PubMed
    1. Bray NL, Pimentel H, Melsted P, Pachter L. 2016. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34: 525–527. 10.1038/nbt.3519 - DOI - PubMed

Publication types

Substances

LinkOut - more resources