Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Abstract

Background: Large-scale RNAi screening has become an important technology for identifying genes involved in biological processes of interest. However, the quality of large-scale RNAi screening is often deteriorated by off-targets effects. In order to find statistically significant effector genes for pathogen entry, we systematically analyzed entry pathways in human host cells for eight pathogens using image-based kinome-wide siRNA screens with siRNAs from three vendors. We propose a Parallel Mixed Model (PMM) approach that simultaneously analyzes several non-identical screens performed with the same RNAi libraries.

Results: We show that PMM gains statistical power for hit detection due to parallel screening. PMM allows incorporating siRNA weights that can be assigned according to available information on RNAi quality. Moreover, PMM is able to estimate a sharedness score that can be used to focus follow-up efforts on generic or specific gene regulators. By fitting a PMM model to our data, we found several novel hit genes for most of the pathogens studied.

Conclusions: Our results show parallel RNAi screening can improve the results of individual screens. This is currently particularly interesting when large-scale parallel datasets are becoming more and more publicly available. Our comprehensive siRNA dataset provides a public, freely available resource for further statistical and biological analyses in the high-content, high-throughput siRNA screening field.

Figures

Figure 1
Figure 1
Overview of InfectX high-content datasets, image analysis, and Parallel Mixed Model (PMM). (A) The figure shows example images of the different pathogens after siRNA transfection and the infection phase. The arrows indicate typical infectious phenotypes for each pathogen. The list shows an example of three single cell features that we extracted to identify infected cells for L. monocytogenes. The scale bar has a length of 50 μm. (B) For each selected feature, we defined the optimal threshold that separated best between uninfected and infected cells via histograms. We used the thresholds in the Decision Tree Infection Scoring (DTIS) algorithm to classify between infected (green) and non-infected cells (red). We optimized this procedure for each pathogen separately. (C) For each well in a 384-well assay plate, we calculated the infection index by dividing the number of infected cells (green) by the total number of cells (green and red). (D) The figure shows a schematic representation of the input data for the statistical analysis. Each point represents the average infection index over all its replicate wells (wells with the same siRNA set targeting the same gene and pathogen). (E) The Parallel Mixed Model (PMM) algorithm fits via a normal distribution for an overall effect a g to all data of gene g. The second plot shows the correction of the overall effect a g within every pathogen by an estimate b pg in order to obtain to an pathogen and gene specific effect c pg. The different sizes of the data points refer to weights w s which can be incorporated in the PMM to depict the quality of the siRNA. (F) The figure shows a schematic representation of the final output of PMM. The model estimates gene effects c pg for each gene and pathogen and provides corresponding local False Discovery Rates q pg.
Figure 2
Figure 2
Using more siRNAs adds power and yields reproducible results. (A) The three boxplots show Pearson correlation coefficients R between screens performed using the same siRNA set. The numbers 1 to 3 correspond to the total number of replicate screens that we averaged and compared to another distinct set of replicate screens, averaged over the same number. We resampled the replicate screens up to 500 times. The scatter plot shows an example for the correlation of infection indices from the duplicate of Adenovirus Dharmacon pooled screen. (B) The set of six boxplots shows the Pearson correlation coefficients of the averaged readouts from 1 to 6 siRNA sets. The scatter plots depict the correlation of infection indices for Adenovirus, the first between two different single siRNAs and the second between each an average over six siRNAs.
Figure 3
Figure 3
Parallel screens add power to find more shared hits. (A) We varied the number of used siRNAs and pathogens and calculated the rank of MET for L. monocytogenes in the ordered list of hit genes. We used PMM (and MTT for the case of one pathogen) over 1000 random resampling rounds with replacement. The color corresponds to the variation of the observed ranks. The boxplot shows that MET is a unique strong hit among the studied pathogens. The star indicates the boxplots that are significantly different from 0 (one sample t-test p < 0.05). (B) The figure shows the same experiment as in (A), but now with MTOR for Vaccianiavirus. The boxplot shows that MTOR is a shared significant hit for several pathogens. (C) The figure shows the same experiment as in (A) but with non-hit ALK for B. abortus for control.
Figure 4
Figure 4
Statistics on used siRNA libraries and hits. (A) We weighted siRNAs based on their library quality. Each vertical compartment in the plot corresponds to a training set of siRNAs. We averaged data in the training set from the siRNAs of the specific manufacturers. Each boxplot corresponds to a test set of single siRNAs from different manufacturers (except “Dharm. siRNA mean” which is the average of 4 Dharmacon unpooled siRNAs). Y-axis refers to Pearson correlation coefficients R between the training and test sets. A star corresponds to significant differences in the correlation coefficients (Mann–Whitney-U-test p < 0.05) between pairs of manufacturers. We used all screens, infection index, and cell number well readouts in the analysis. We used the results of this analysis to assign siRNA weights to siRNAs from different library manufacturers as shown below the plot. (B) The histogram shows obtained FDR q-values from all screens using the infection index readouts. The red line shows the FDR-threshold of 0.4. (C) The bar shows number of up and down hits for different pathogens. (D) The bar plot shows the number of hit genes that were shared between pathogens.
Figure 5
Figure 5
Summary of screening hits for all pathogens. (A) The heat map shows all genes which were significant (FDR < 0.4) at least for one pathogen. We ordered the genes by their averaged c-values over all pathogens. The colors correspond to the estimated c-values. The black outlines indicate significant hits (FDR < 0.4) and the green outlines high-light the strongest down and up hits for each pathogen. The rightmost column shows the sharedness scores for each gene. (B) The network shows the hit genes (FDR < 0.4 for at least one pathogen) and their direct neighbors that had connections between kinases in STRING database (version 9.0). The edges are functional interactions in the STRING database with edge threshold 850. We removed genes that were not connected to any other gene from the network. Each node consists of a colored pie chart, in which each piece corresponds to a pathogen.
Figure 6
Figure 6
Performance statistics of hit ranking methods. (A) The figure shows stability curves using the three different methods (PMM, MTT and RSA). The y-axis denotes the number of genes that were found with probability higher than 0.7 (dashed lines) and 0.9 (solid lines) in the top k (x-axis) of the list of ranked genes. The curves show the average over all eight pathogens. (B) The figure shows hit overlaps of cross-validated siRNA sets between the set of 10 unpooled siRNA libraries and the remaining siRNA library using the three tested gene ranking methods as a function of hit threshold k. The curves show the average over all eight pathogens. (C) The figure shows ROC-curves for PMM, MTT and RSA applied on simulated data containing only hits that were shared between all pathogens. The dashed and solid lines indicate whether the shifts were generated by a low or high shift away from zero. The PMM method outperformed the reference hit detection method. (D) The figure shows ROC-curves for PMM, MTT and RSA applied on simulated data containing only unique hits for all pathogens. PMM and Moderated T-Test performed equally well. (E) The figure shows ROC-curves for simulated data with a mixed hit structure of both unique and shared hits. The PMM method outperformed the reference hit detection method.
Figure 7
Figure 7
Summary of differences of PMM top hits compared to other hit scoring methods. (A) Y-axis shows the PMM gene ranking for L. monocytogenes. X-axis is the same, but we randomized the other 7 parallel assays. The colors correspond to hit genes (FDR < 0.4) in different cases. Parallelism yielded only a slight effect on the ranking, but added genes to the list of significant hit genes. (B) The scatter plot shows PMM hit ranking (y-axis) compared to the MTT hit ranking (x-axis) for L. monocytogenes. The dot size corresponds to the sharedness score of each gene. Some genes with high sharedness scores gained statistical power.

Similar articles

See all similar articles

Cited by 15 articles

See all "Cited by" articles

References

    1. Conrad C, Gerlich DW. Automated microscopy for high-content RNAi screening. J Cell Biol. 2010;188(4):453–461. doi: 10.1083/jcb.200910105. - DOI - PMC - PubMed
    1. Mohr S, Bakal C, Perrimon N. Genomic screening with RNAi: results and challenges. Annu Rev Biochem. 2010;79:37–64. doi: 10.1146/annurev-biochem-060408-092949. - DOI - PMC - PubMed
    1. Mohr SE, Perrimon N. RNAi screening: new approaches, understandings, and organisms. Wiley Interdiscip Rev RNA. 2012;3(2):145–158. doi: 10.1002/wrna.110. - DOI - PMC - PubMed
    1. Simpson KJ, Davis GM, Boag PR. Comparative high-throughput RNAi screening methodologies in C. elegans and mammalian cells. N Biotechnol. 2012;29(4):459–470. doi: 10.1016/j.nbt.2012.01.003. - DOI - PubMed
    1. Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Weber K, Tuschl T. Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature. 2001;411(6836):494–498. doi: 10.1038/35078107. - DOI - PubMed

Publication types

Substances

Feedback