Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb;14(2):415-440.
doi: 10.1038/s41596-018-0099-1.

Using BEAN-counter to Quantify Genetic Interactions From Multiplexed Barcode Sequencing Experiments

Affiliations
Free PMC article

Using BEAN-counter to Quantify Genetic Interactions From Multiplexed Barcode Sequencing Experiments

Scott W Simpkins et al. Nat Protoc. .
Free PMC article

Abstract

The construction of genome-wide mutant collections has enabled high-throughput, high-dimensional quantitative characterization of gene and chemical function, particularly via genetic and chemical-genetic interaction experiments. As the throughput of such experiments increases with improvements in sequencing technology and sample multiplexing, appropriate tools must be developed to handle the large volume of data produced. Here, we describe how to apply our approach to high-throughput, fitness-based profiling of pooled mutant yeast collections using the BEAN-counter software pipeline (Barcoded Experiment Analysis for Next-generation sequencing) for analysis. The software has also successfully processed data from Schizosaccharomyces pombe, Escherichia coli, and Zymomonas mobilis mutant collections. We provide general recommendations for the design of large-scale, multiplexed barcode sequencing experiments. The procedure outlined here was used to score interactions for ~4 million chemical-by-mutant combinations in our recently published chemical-genetic interaction screen of nearly 14,000 chemical compounds across seven diverse compound collections. Here we selected a representative subset of these data on which to demonstrate our analysis pipeline. BEAN-counter is open source, written in Python, and freely available for academic use. Users should be proficient at the command line; advanced users who wish to analyze larger datasets with hundreds or more conditions should also be familiar with concepts in analysis of high-throughput biological data. BEAN-counter encapsulates the knowledge we have accumulated from, and successfully applied to, our multiplexed, pooled barcode sequencing experiments. This protocol will be useful to those interested in generating their own high-dimensional, quantitative characterizations of gene or chemical function in a high-throughput manner.

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

A license is required to use the BEAN-counter software (http://z.umn.edu/beanctr). It is free for academic use and must be purchased on a per-project basis for commercial use.

Figures

Figure 1
Figure 1. Overview of multiplexed barcode sequencing experiments and their processing using the BEAN-counter software.
(a) A collection of barcoded yeast mutants (denoted by color) is subjected to both treatment and negative control conditions, followed by PCR amplification of the genetic barcode sequences using indexed primers (indexing sequences indicated with black/gray specify different conditions) and ultimately, massively parallel sequencing. (b) The core of the BEAN-counter software is a pipeline to process raw sequencing reads into interaction z-scores, which are calculated by comparing each condition’s mutant abundance profile against that of a mean profile derived from negative control conditions. Here, a positive chemical-genetic (CG) interaction score reflects a mutant’s resistance to a compound and is depicted with yellow in all heatmaps. Negative interaction scores reflect mutant sensitivities to compounds and are depicted with blue. BEAN-counter also provides additional post-processing tools to remove systematic effects and/or common yet uninformative signal typically observed in our pooled screening datasets.
Figure 2
Figure 2. Design of large-scale, pooled and multiplexed chemical-genetic interaction screens.
(a) A barcoded collection of mutants is pooled and then competitively grown in the presence of different chemical compounds. (b) Scheme for the generation of PCR amplicons from pooled competition experiments that enables a high degree of sample multiplexing (768-plex in our experiments). (c) Typical layout of positive and negative control conditions in each screening plate. (d). Per-flow-cell sequencing scheme to maximize coverage of index tags with negative control conditions. Each column represents the samples combined into each sequencing lane (labeled L1 – L8 for each lane in a HiSeq flow cell), and each row represents the samples amplified with a specific plate of 96 unique indexed primers (labeled P1 – P8; 768 unique indexed primers in total). The different configurations are optimal for different sizes of barcoded mutant collection. We achieved 768-plex in our large-scale chemical-genomic screen performed across ~300 diagnostic mutants. A screen against a larger mutant collection, however, requires a decrease in sample multiplexing to ensure sufficient sequencing depth for each sample. A 384-plex scheme is preferable for collections of ~1000 mutants (such as a collection of yeast essential gene mutants), as is a 96-plex scheme for collections of ~4000 mutants (such as the entire yeast nonessential deletion collection).
Figure 3
Figure 3. Schematic of the steps involved in processing large-scale interaction screens using BEAN-counter.
(a) Individual stages of the process_screen.py script (Steps 6, 16, and 17 in PROCEDURE) for scoring interactions from raw sequencing data. Locations of important output files are shown in the right column within the <dir>/output/ folder. (b) BEAN-counter provides post-processing tools to visualize and remove systematic biases and uninformative signal from the matrix of interaction z-scores, which is originally generated by process_screen.py. The user must determine the sequence of post-processing steps based on the severity and removability of these unwanted signals. The software also includes tools to collapse profiles across replicate conditions and to export the data in text-based formats for browsing in Java TreeView or further analysis in other programming languages. We include at the bottom of the relevant boxes the steps in the PROCEDURE where each command is invoked.
Figure 4
Figure 4. The large signature that we observe in and remove from most of our datasets.
(a,b) Clustered heat map representations of chemical-genetic interaction profiles from our large screen of the RIKEN Natural Product Depository. Each pixel in the heat map represents an interaction z-score that reflects the deviation of the observed barcode abundance from that expected in a given condition. Rows and columns are clustered using hierarchical agglomerative clustering using average linkage and 1 – Pearson’s correlation coefficient as the distance metric. (a) Chemical-genetic interaction profiles obtained directly after interaction scoring. (b) Chemical-genetic interaction profiles after removal of one SVD component. (c) Histogram of the data in (a) showing the mean profile correlation (Pearson’s correlation coefficient) within each group of compound replicates (“same compound,” mean group correlation = 0.78) or between each pair of compounds (“different compound,” mean group correlation = 0.15). (d) Histogram of the data in (b) showing the mean profile correlation within each group of compound replicates (mean = 0.73) or between each pair of compounds (mean = 0.01). (e) Precision-recall and receiver operating characteristic curves for evaluating the ability of profile correlation to predict if two profiles were generated by the same compound, for 0 and 1 SVD component-removed datasets.
Figure 5
Figure 5. An inoculation date-related effect in one of our datasets.
(Box 1) (a) Chemical-genetic interaction profiles computed from data not partitioned into inoculation date-based groups. (b) Chemical-genetic interaction profiles computed on data partitioned into inoculation date-based groups. (c) Precision-recall and receiver operating characteristic analyses evaluating the ability of profile correlation to predict if two profiles were derived from inoculations performed on the same date, for interaction profiles computed from non-partitioned (“combined”) and inoculation date-partitioned (“per-date”) data.
Figure 6
Figure 6. Typical barcode and index tag abundance distributions.
Substantial deviations from these distributions could be the result of errors in experimental or computational procedures and should be investigated. (a) Distribution of reads across index tags. (b) Distribution of reads across genetic barcodes.
Figure 7
Figure 7. Manual examination of the dataset to flag mutants and conditions for removal.
(a) Chemical-genetic interaction profiles before manual removal of conditions and mutants (generated in Java TreeView). Profiles for positive control conditions, conditions flagged for removal, and mutants flagged for removal are expanded for emphasis (MMS: methyl methanesulfonate). Mutants were flagged for removal from the dataset based on high variability of interaction signal (resulting in interactions with most negative control conditions) and undesired behavior in conditions of high growth inhibition (< 50% growth compared to negative control conditions). The 36 conditions flagged for manual removal from the dataset could be divided into three classes (from left to right in the heatmap inset): 1) treatment conditions that exhibit almost exclusively positive interactions; 2) negative control profiles that exhibit a common signal that is inconsistent compared to other negative control profiles; and 3) a combination of both negative control and treatment profiles that share a similar set of strong, negatively interacting strains. (b) Interaction profiles for all negative experimental control conditions (DMSO), from both DMSO-only plates and compound-containing plates with four DMSO controls. (c) Chemical-genetic interaction profiles after manual removal of conditions and mutants.
Figure 8
Figure 8. Analysis of same-compound, same-index tag, and same-lane correlations to detect the presence of batch effects and uninformative signal.
(a) Analysis of same-compound (replicate) profile correlations. The histogram shows the mean profile correlation (Pearson’s correlation coefficient) within each group (“same compound,” mean = 0.75) or between groups (“different compound,” mean = 0.13) of compound replicates. The precision-recall and receiver operating characteristic (ROC) curves show the ability of compound replicate correlations to predict compound replicate status. (b) Analysis of same-index tag profile correlations. The histogram shows the mean profile correlation within each group (“same index tag,” mean = 0.11) or between groups (“different index tag,” mean = 0.06) of conditions amplified with the same indexed primer. The precision-recall and ROC curves show the ability of profile correlations to predict if two conditions were amplified with the same indexed primer. (c) Analysis of same-lane profile correlations. The histogram shows the mean profile correlation within each group (“same lane,” mean = 0.11) or between groups (“different lane,” mean = 0.05) of conditions sequenced in the same HiSeq lane. The precision-recall and ROC curves show the ability of profile correlations to predict if two conditions were sequenced in the same HiSeq lane. The format of the plots was modified slightly from the default BEAN-counter output.
Figure 9
Figure 9. Removal of large, uninformative signature via singular value decomposition (SVD).
(a) Chemical-genetic interaction profiles after the first SVD component was removed from the data. (b) Chemical-genetic interaction profiles after the first two SVD components were removed from the data. (c) Histogram showing the mean profile correlation within each group (mean = 0.71) or between groups (mean = 0.02) of compound replicates after removal of one SVD component. (d) Same as (c), but for two SVD components removed (within-group mean = 0.74, between-group mean = 0.01). (e) Precision-recall and receiver operating characteristic analyses of compound replicate correlations after removing 0 to 4 SVD components.

Similar articles

See all similar articles

Cited by 2 articles

Publication types

MeSH terms

Feedback