Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 2;9(1):4610.
doi: 10.1038/s41467-018-06916-5.

Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration

Affiliations

Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration

James M McFarland et al. Nat Commun. .

Abstract

The availability of multiple datasets comprising genome-scale RNAi viability screens in hundreds of diverse cancer cell lines presents new opportunities for understanding cancer vulnerabilities. Integrated analyses of these data to assess differential dependency across genes and cell lines are challenging due to confounding factors such as batch effects and variable screen quality, as well as difficulty assessing gene dependency on an absolute scale. To address these issues, we incorporated cell line screen-quality parameters and hierarchical Bayesian inference into DEMETER2, an analytical framework for analyzing RNAi screens ( https://depmap.org/R2-D2 ). This model substantially improves estimates of gene dependency across a range of performance measures, including identification of gold-standard essential genes and agreement with CRISPR/Cas9-based viability screens. It also allows us to integrate information across three large RNAi screening datasets, providing a unified resource representing the most extensive compilation of cancer cell line genetic dependencies to date.

PubMed Disclaimer

Conflict of interest statement

W.C.H. is a consultant for Thermo-Fisher, Paraxel, AjuIB, MPM Capital and KSQ Therapeutics and receives research funding from Deerfield Management. W.C.H. is a founder and has equity in KSQ Therapeutics. T.R.G. is a consultant to Foundation Medicine and GlaxoSmithKline, and is a shareholder of FORMA Therapeutics. D.E.R. receives research funding from members of the Functional Genomics Consortium (Abbvie, Jannsen, Merck, Vir), and is a director of Addgene, Inc. A.T. is a consultant for Tango Therapeutics. All remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
DEMETER2 improves identification of essential genes. a Both D1 and D2 represent the observed shRNA log fold change (LFC) depletion values in each cell line (CL) as a combination of gene knockdown and off-target seed effects. D2 introduces a number of additional model components highlighted in the schematic diagram. b Separation of gene dependency distributions for known common essential genes and non-essential (unexpressed) genes is measured by the strictly standardized mean difference (SSMD). Positive/negative control separation was much better for DEMETER2 gene dependency scores (blue dots) compared with per-gene averaging of shRNA depletion scores (GA; yellow dots) in both the Achilles (left) and DRIVE (right) datasets. c D2 estimates of across-cell-line average gene dependency showed improved separation of positive and negative control genes compared with previous methods. d Across-cell-line average gene dependency scores were in better agreement between datasets (Achilles RNAi, DRIVE RNAi, and CRISPR-Cas9 data) when using D2 estimates compared with previous methods. Each bar chart shows the correlation of average dependency scores between a pair of datasets. Colors represent agreement when using different models for estimating dependencies from RNAi data
Fig. 2
Fig. 2
D2 corrects biases related to variable screen quality. a Comparison of across-cell-line average gene dependency scores with scores estimated for individual example low- (left) and high- (right) quality screens. Density estimates for the set of gold-standard common essential and non-essential genes are highlighted by the red and blue contours, respectively. Estimates using gene-averaging (GA; top plots) show broad systematic differences across all essential genes in these cell lines compared with the population average. These systematic differences are corrected for by D2 (bottom plots). b The screen quality estimated for each cell line (SSMD of positive/negative control gene dependencies, using GA) was correlated with the expression level of AGO2 for both the Achilles (Spearman’s rho = 0.39; p < 2.2 × 10−16; green) and DRIVE (rho = 0.37; p = 1.3 × 10−13; gold) datasets. c Correlation between each gene’s dependency profile and mRNA expression of AGO2 is plotted against the across-cell-line average dependency score for the gene, with curated common-essential and non-essential genes indicated with red and blue dots, respectively. Using D1 (left), gene dependency profiles were systematically (negatively) correlated with the AGO2 expression for more common essential genes. This correlation was eliminated using D2 (right)
Fig. 3
Fig. 3
Screen quality biases impair dependency correlation analyses. a Correlation between pairs of gene dependency profiles, estimated using D1 applied to the Achilles data, increased systematically for gene pairs that were more essential on average. Red line shows the smoothed average. Color shows the density of data points in each region (log color scale). b Same as a but using D2, showing that the systematic upward bias in the pairwise correlation between pairs of common essential genes is removed. c Gene dependency correlation network surrounding MED14, constructed using D1 dependency estimates applied to the DRIVE dataset. Each node represents a gene, with edges depicting strong pairwise correlations (see Methods). Red nodes indicate genes that form protein complexes with MED14. Node size indicates the across-cell-line average dependency score for each gene (larger nodes representing more common-essential genes). d Same as c using D2, showing that the local dependency correlation network for MED14 is more enriched for co-complex members
Fig. 4
Fig. 4
D2 improves estimated dependency profiles, particularly for essential genes. a Average correlation between RNAi and CRISPR-Cas9 gene dependency profiles as a function of the across-cell-line average dependency score, using the Achilles dataset. Different colored curves and shaded regions show the smoothed conditional mean correlation, and 95% confidence intervals, obtained using different models for estimating RNAi gene dependencies. D2 gene dependency estimates show better average agreement with CRISPR-Cas9 dependency profiles compared to existing methods. b Average magnitude of correlation between each gene’s dependency and mRNA expression profiles (again for the Achilles dataset), plotted as a function of across-cell-line average gene dependency as in a. D2 dependency scores showed a stronger correlation with the gene’s own expression levels compared with existing methods. c Similar to b, showing stronger correlations between D2 dependency profiles and the genes’ own relative copy number, particularly for genes which are more essential on average. d Scatterplot of RPL37 dependency vs. RPL37 relative copy number using D1 (left) and D2 (right) dependency scores. Color represents the screen signal parameter estimated (from D2) for each cell line. e A benchmark set of dependency-genomic feature relationships identified from CRISPR-Cas9 data (see Methods) was used to evaluate the extent to which Achilles RNAi dependency estimates recapitulated the same associations. Colored curves show the empirical distributions of correlation magnitude across these dependency-feature pairs for each model. D2 dependency estimates showed better agreement with benchmark genomic feature associations compared to existing methods. Bar chart at the right shows the fraction of dependency-feature pairs with correlation magnitude greater than 0.4 for each model
Fig. 5
Fig. 5
D2 effectively integrates multiple RNAi screen datasets. a Venn diagram showing the overlap of cell lines screened across the DRIVE, Achilles and Marcotte et al. datasets. Pie chart showing the composition of the combined dataset by primary disease. b The first two principal components of the gene dependency data for the combined GA (left) and D2 (right) data. Different colors represent which set of experiments were used to screen each cell line. Color scheme is the same as a, though light and dark blue indicate the cell lines screened with the “98k” and “55k” libraries in the Achilles dataset. c The statistical significance of measured associations between the benchmark dependency/feature pairs (same set as in Fig. 4e) is greatly increased when using the combined D2 dataset. Plot shows the empirical CDF of negative log p-values for dependency-feature associations using each model. d Number of gene’s whose dependency profile is most correlated with a genomic feature that is from the same (red) or a related (green) gene, when using different models and datasets. e Dependency-feature associations identified by each model are classified as CYCLOPS, oncogene expression, oncogene mutation, paralog loss, and physical interactions. The D2 combined model identifies more relationships in nearly every class compared to using the individual datasets or other models. f Correlation between gene dependency and gene dosage (either mRNA or copy number) is compared across genes using either the combined D2 data or the best correlated of either D1 dataset (Achilles or DRIVE; see Methods). Subset at the right shows genes with strong positive dose-dependency correlations (i.e., CYCLOPS-like genes). g Comparison of feature-dependency correlation magnitude (in D2 combined data) with across-cell-line average dependency for CYCLOPS (left) and paralog loss (right) relationships. Color depicts which relationships were uniquely identified by the D2 combined data vs. shared with D1 datasets

Similar articles

Cited by

References

    1. Tsherniak A, et al. Defining a cancer dependency map. Cell. 2017;170:564–576. doi: 10.1016/j.cell.2017.06.010. - DOI - PMC - PubMed
    1. McDonald ER, et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell. 2017;170:577–592. doi: 10.1016/j.cell.2017.07.005. - DOI - PubMed
    1. Marcotte R, et al. Functional genomic landscape of human breast cancer drivers, vulnerabilities, and resistance. Cell. 2016;164:293–309. doi: 10.1016/j.cell.2015.11.062. - DOI - PMC - PubMed
    1. Jackson AL, et al. Widespread siRNA “off-target” transcript silencing mediated by seed region sequence complementarity. RNA. 2006;12:1179–1187. doi: 10.1261/rna.25706. - DOI - PMC - PubMed
    1. Birmingham A, et al. 3′ UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat. Methods. 2006;3:199–204. doi: 10.1038/nmeth854. - DOI - PubMed

Publication types