Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 10;9(1):2667.
doi: 10.1038/s41467-018-05083-x.

Detection and removal of barcode swapping in single-cell RNA-seq data

Affiliations

Detection and removal of barcode swapping in single-cell RNA-seq data

Jonathan A Griffiths et al. Nat Commun. .

Abstract

Barcode swapping results in the mislabelling of sequencing reads between multiplexed samples on patterned flow-cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays; however, the severity and consequences of barcode swapping remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in two plate-based single-cell RNA-sequencing datasets. We found that approximately 2.5% of reads were mislabelled between samples on the HiSeq 4000, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Furthermore, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA-sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10x Genomics experiments, allowing the continued use of cutting-edge sequencing machines for these assays.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
A schematic of the mechanism for barcode swapping, as proposed by Sinha et al.. On new models of the Illumina sequencing machines, flow cell seeding and DNA amplification take place simultaneously, without any washes of the flow cell between steps. As a result, free sample indexing barcodes remain in solution and can be inadvertently extended using DNA molecules from libraries with different barcodes as templates. The transfer of mislabelled molecules between nanowells of the flow cell results in clustering and sequencing of incorrectly labelled DNA molecules
Fig. 2
Fig. 2
Characterization of barcode swapping in plate-based scRNA-seq experiments. a The experimental design of the Richard dataset. Two 96-well plates of cells were multiplexed for sequencing. Expected barcode combinations are marked in blue, while impossible barcode combinations are marked in grey. b Distribution of the library sizes (i.e., number of mapped reads) in the expected and impossible barcode combinations. c Library size of each impossible combination (observed swapped reads), plotted against the sum of the library sizes of the expected combinations that share exactly one barcode with that impossible combination (available swapping reads). An example is illustrated graphically in the inset Figure for one impossible combination (red) and the contributing expected combinations (orange). The gradient represents the fraction of available reads from the expected combinations that swap into each impossible combination. d Estimated swapping fractions for different plates of the Nestorowa et al. [10] dataset, plotted against the ratio of the concentration of free barcode to the concentration of cDNA of the correct length for sequencing. A linear regression fit is shown with its 95% confidence interval. The slope of the fitted line is not significantly different from 0 (p = 0.129)
Fig. 3
Fig. 3
Characterization of barcode swapping in droplet-based scRNA-seq experiments. a The expected number of cells with shared cell barcodes in 10x Genomics samples that have been multiplexed for sequencing, for different numbers of samples and different numbers of captured cells per sample. The cell exclusion approach for barcode swapping would remove these cells. b A schematic of our method to remove swapped reads from droplet data. Reads found in different samples with the same combination of UMI, cell barcode, and aligned gene were considered to have swapped. If most reads (≥80%) were present in one sample, we excluded the molecule from all other samples (i). If reads were more evenly spread across samples, we excluded the molecule from all samples (ii). Reads in one sample only were retained (iii). c t-SNE plot of the expression profiles of mouse epithelial cells. Each point represents a cell that is coloured by sample. Letters correspond to different experimental conditions while numbers represent biological replicates. d The distribution of the library sizes for called cells in each sample. Cells were called using emptyDrops, with an FDR threshold of 1% and a minimum of 1000 UMIs. e The number of called cells for each sample, before and after application of our swapped read exclusion algorithm

Similar articles

Cited by

References

    1. Sinha, R. et al. Index switching causes “spreading-of-signal” among multiplexed samples in illumina HiSeq 4000 DNA sequencing. Preprint at bioRxiv: http://biorxiv.org/content/early/2017/04/09/125724 (2017).
    1. Costello M, et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genom. 2018;19:332. doi: 10.1186/s12864-018-4703-0. - DOI - PMC - PubMed
    1. Zheng GX, et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 2017;8:14049. doi: 10.1038/ncomms14049. - DOI - PMC - PubMed
    1. Schiebinger, G. et al. Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming. Preprint at bioRxiv: https://www.biorxiv.org/content/early/2017/09/27/191056 (2017).
    1. Dixit A, et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016;167:1853–1866.e17. doi: 10.1016/j.cell.2016.11.038. - DOI - PMC - PubMed

Publication types