Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 22;12(4):863.
doi: 10.15252/msb.20156660.

Pooled-matrix Protein Interaction Screens Using Barcode Fusion Genetics

Free PMC article

Pooled-matrix Protein Interaction Screens Using Barcode Fusion Genetics

Nozomu Yachie et al. Mol Syst Biol. .
Free PMC article


High-throughput binary protein interaction mapping is continuing to extend our understanding of cellular function and disease mechanisms. However, we remain one or two orders of magnitude away from a complete interaction map for humans and other major model organisms. Completion will require screening at substantially larger scales with many complementary assays, requiring further efficiency gains in proteome-scale interaction mapping. Here, we report Barcode Fusion Genetics-Yeast Two-Hybrid (BFG-Y2H), by which a full matrix of protein pairs can be screened in a single multiplexed strain pool. BFG-Y2H uses Cre recombination to fuse DNA barcodes from distinct plasmids, generating chimeric protein-pair barcodes that can be quantified via next-generation sequencing. We applied BFG-Y2H to four different matrices ranging in scale from ~25 K to 2.5 M protein pairs. The results show that BFG-Y2H increases the efficiency of protein matrix screening, with quality that is on par with state-of-the-art Y2H methods.

Keywords: DNA barcode; interactome; next‐generation sequencing; protein interaction; yeast two‐hybrid.


Figure 1
Figure 1. Principle of Barcode Fusion Genetics, a technology to generate fused barcodes that uniquely identify the presence of a specific combination of engineered loci

Each cell carries two engineered loci, such that each locus is identified by the presence of a barcode flanked by site‐specific recombination sites. In the presence of Cre recombinase, a double‐crossover DNA recombination is induced to form chimeric “fused” barcodes that represent the combination of loci.

Multiple pairwise combinations of reagents can be tested in a pool. Fused barcodes can be amplified and analyzed by deep sequencing to analyze the abundance of cells corresponding to each X‐Y combination.

Figure EV1
Figure EV1. Design of BFG‐Y2H plasmids and strains

Design of barcoded bait and prey destination plasmids. Each destination plasmid carries two DNA barcodes (BC1 and BC2) interdigitated with loxP and lox2272 sites. The barcoded bait or prey destination vectors harbor loxP'‐BC1‐lox2272‐linker‐BC2 or BC1‐linker‐loxP'‐BC2‐lox2272 fragments at the PspOMI restriction site for the in‐yeast assembly‐based BFG‐Y2H (red arrow), and at the SacI restriction site for the en masse recombinational cloning‐based BFG‐Y2H (blue arrow) so as to allow identification of barcode–ORF combinations in a paired‐end sequencing read (loxP' denotes the reverse complement of loxP). Each of the BC1 and BC2 regions is composed of a unique 25‐bp DNA barcode flanked by common forward and reverse priming sites to allow barcode amplification by PCR: O059 and O060 sites for bait‐BC1; O061 and O062 sites for bait‐BC2; O063 and O064 sites for prey‐BC1; and O065 and O066 PCR priming sites for prey‐BC2.

BFG‐Y2H toolkit strains RY1010 (MATa) and RY1030 (MATα) were constructed in a way that Cre recombinase expression can be induced only within diploids obtained by mating the two toolkit strains. A P CMV ‐rtTA‐KanMX4 (toolkit‐a cassette) fragment replaced the CAN1 region of Y8800 chromosome and a T ADH1 ‐P tetO2 ‐Cre‐T CYC1 ‐KanMX4 (toolkit‐α cassette) replaced the CAN1 region of Y8930.

Illustration of Tet‐On system‐based Cre expression. In the presence of doxycycline (dox), rtTA protein is activated, producing Cre via the tetO 2 promoter.

Genotyping PCRs to confirm the toolkit strains. Successful creation of the toolkit strains was confirmed by direct PCRs for the strains RY1010, RY1030, Y8800, and Y8930. For each strain, the existence of the 5′ and 3′ regions of the wild‐type CAN1 was checked with O067 and O068 primers (TKgt1 PCR), and O069 and O070 primers (TKgt2 PCR), respectively. The integration of the toolkit‐a cassette at the CAN1 locus was checked by TKgt3 (amplification of the CAN1 5′/P CMV boundary by O067 and O071 primers), TKgt4 (the rtTA‐encoding region by O072 and O073 primers), and TKgtMX4 PCRs (the MX4/CAN1 3′ boundary by O074 and O070 primers). Finally, the integration of the toolkit‐α cassette was checked by TKgt5 (the CAN1 5′/T ADH1 boundary by O067 and O075 primers), TKgt6 (the Cre‐encoding region by O076 and O077 primers) and TKgtMX4 PCRs. All primer sequences can be found in Table EV5.

Figure 2
Figure 2. Design of the BFG‐Y2H technology
A pool of diploid cells, potentially expressing all possible pairwise combinations of bait and prey fusion proteins, is generated via en masse yeast mating, in which a haploid pool of bait strains (MATα) is mated with a pool of prey strains (MATa). Diploid cells surviving the Y2H selection are pooled, and Cre recombinase is induced to swap the positions of the bait‐BC1 and prey‐BC2 and to generate chimeric BC1‐BC1 and BC2‐BC2 barcode fusions that each uniquely identifies a candidate X‐Y interaction. Cells are then lysed, plasmids are extracted, and a DNA sequencing library is prepared by PCR for both BC1‐BC1 and BC2‐BC2 fused barcodes. Finally, protein interactions are identified according to the enrichment of sequencing read counts for fused barcodes corresponding to particular protein pairs.
Figure EV2
Figure EV2. Proof‐of‐principle demonstrations of BFG‐Y2H

Major and shortest recombination pathways by which bait‐BC1 and prey‐BC2 are physically swapped between bait and prey plasmids.

Barcode swapping on bait plasmid demonstrated by CHX treatment of various diploid strains to counter‐select CYH2‐encoding plasmids (see Fig EV1 for detail on the TKgt1 and TKgt3 genotyping PCRs). Evidence of loss of plasmid due to CHX treatment is indicated by “*”. “RY” represents our toolkit strain background, and “Y” represents the Y‐strain background commonly used in latest Y2H experiments (Appendix Note S1).

Genotyping PCR of fused and unfused barcodes in various strain backgrounds (see Fig EV1 for description of TKgt1/2/3/5 PCRs). A PCR product corresponding to a fused barcode is indicated by “*”.

Frequencies of BC1‐BC1 and BC2‐BC2 fused barcodes obtained from a mixture of two diploid strains, such that each diploid strain harbored uniquely barcoded bait and prey plasmids.

Demonstration of small‐scale BFG‐Y2H. (E) 14 positive reference set (PRS) pairs (navy cells) and 7 random reference set (RRS) pairs (gray cells) were chosen from the CCSB human PRS and RRS version 1 (hsPRSRRSv1). The 14 PRS pairs were reported to be Y2H positive in both of the X‐Y and Y‐X configurations by pairwise testing. (F) BFG‐Y2H screens for the small‐scale matrix with –His+3‐AT, –Ade, and no‐selection control (+His +Ade) conditions.

Figure 3
Figure 3. Massively parallel generation of barcoded Y2H strains

Library‐scale in‐yeast assembly to generate Y2H strains carrying barcoded ORF‐expressing plasmids. In each reaction, the Gal4 DNA binding or activation domain, and ORF, barcode and backbone DNA fragments were directly assembled in vivo in either the toolkit‐a or toolkit‐α strain background.

Barcoded Y2H strains derived by in‐yeast assembly. Colony growth indicates yeast cells harboring the correctly assembled plasmids. The yellow boxes denote “no ORF fragment” negative controls.

Quality confirmation of in‐yeast assembly‐based barcoded strain generation. After in‐yeast assembly, single colonies were isolated and the DNA fragments were recovered by yeast colony PCR. “TK” denotes genotyping PCR to confirm the presence of the chromosomal locus that defines the toolkit strains.

Figure EV3
Figure EV3. Rapid generation of DNA barcode collections

Two oligonucleotide DNA pools harboring random 25‐bp sequence regions were combined with site‐specific recombination sites (loxP and lox2272) by PCR, assembled with a linear plasmid backbone fragment by Gibson assembly. The resulting randomly barcoded plasmid pool was transformed to E. coli cells.

Single randomly barcoded E. coli colonies were picked and isolated into 384‐well plates, and the pair of unique barcodes in each well was identified by row–column plate (RCP)‐PCR (Appendix Note S2). RCP‐PCR was designed to determine the identity and plate and well location of barcode sequences in many strains arrayed in microwell plates via a single next‐generation sequencing run. In a 384‐well format reaction, 16 forward primers with row‐specific DNA index tags and 24 reverse primers with column‐specific DNA index tags are distributed to their corresponding row and column positions. The forward and reverse row‐/column‐specific primers also have plate primer landing sites on their ends.

RC‐PCRs are performed for individual plates to stitch row and column tags to each barcode. RC‐PCR products are then pooled by plates. A Plate‐PCR stitches plate index tags and Illumina paired‐end sequencing adapters to each RC‐PCR product pool. The plate‐PCR products are pooled and sequenced en masse to identify the sequence of barcode regions at every row–column plate coordinate.

Figure 4
Figure 4. Screening coverage, reproducibility, and other features of BFG‐Y2H CENT screen

Normalized fused‐barcode abundance is shown for 1) non‐selective conditions, based on observed fused‐barcode abundance at a sequencing depth that is only sufficient for accurately determining barcode marginal abundance (“+His observed at low saturation”); 2) non‐selective conditions, as inferred from marginal abundance of single‐barcode frequencies (“+His inferred”), and 3) selective conditions based on observed fused‐barcode abundance (“–His” and “3‐AT”).

Average of normalized fused‐barcode count for each ORF pair (f average) in (B) the non‐selective (+His) condition and (C) in the selective (–His) condition. CS: calibration set space spiked in the screen.

Correlation of f average between different pairs of replicate types in the selective conditions (scatter plots are log‐scale).

Analysis of barcode fusion efficiency. Frequencies of 7‐bp flanking motif combinations located upstream and downstream of loxP (yellow arrow) or lox2272 (green arrow) sites were analyzed by Illumina Nextera sequencing for the –His condition.

Interaction score profile for the CENT screen with parameters optimized according to the Matthews correlation coefficient (MCC) to recapture previously reported Y2H interactions.

Figure EV4
Figure EV4. Additional information on the CENT screen

Distribution of fused‐barcode counts in the non‐selective (+His), selective (–His), and stringent selective (3‐AT) conditions with (A) and without (B) the seven auto‐activators.

Distributions of normalized row‐total abundances and column‐total abundances in the non‐selective condition (+His, with auto‐activators), inferred distributions of pre‐mating bait and prey haploid strain abundance, respectively.

Information on the CENT screen performed without the seven auto‐activators. (D) Distribution of normalized fused‐barcode abundance observed in non‐selective conditions (+His observed at low saturation), inferred for the non‐selective condition using row‐ and column‐total abundances (+His inferred) observed in the selective conditions (−His and 3‐AT). (E) Average of normalized fused‐barcode count for each ORF pair (f average) in the non‐selective (+His) condition and the selective (−His) condition. CS: calibration set space spiked in the screen.

Correlation between BFG‐Y2H interaction scores and GPCA luciferase intensities. The background MCC heat map demonstrates overlap of the two datasets at each threshold combination.

Figure EV5
Figure EV5. Sequencing of extracted plasmid pools after induction of barcode fusion

Schematic diagram of the experimental flow. After the yeast plasmid DNA extraction during the CENT screen, in a separate experimental procedure, plasmid DNA samples of +His and –His conditions were amplified by φ29 polymerase‐based rolling circle amplification (RCA). Fused barcodes were then amplified by PCR from the RCA‐treated DNA pool and sequenced (Illumina MiSeq). In parallel, entire‐plasmid DNA pools were sequenced (Illumina Nextera library preparation sequenced on an Illumina MiSeq).

Counts of Nextera sequencing reads mapped using appropriate reference databases.

Human ORFs found among entire‐plasmid sequencing reads. ORFs were sorted according to the read density, defined as their read counts divided by ORF length (reads/bp). Yellow bars denote centrosomal ORFs interrogated in the screen; and black bars denote “unexpected” ORFs. The sequencing result of the +His condition covered all of the centrosomal ORFs, while some centrosomal ORFs were not found in the –His Y2H screening condition (dropouts), as expected given that not all ORFs encode interacting proteins.

Precision‐recall curve assessing separation of centrosomal ORFs from non‐centrosomal ORFs by read density among ORFs found by Nextera sequencing.

Correlation between ORF read densities and corresponding barcode copy numbers.

Estimation of barcode fusion efficiency from heptamer–lox–heptamer combinations found in the Nextera sequencing reads of the +His condition (the −His condition data can be found in Fig 4E). Yellow and Green arrows denote loxP and lox2272 sites, respectively.

Figure 5
Figure 5. BFG‐Y2H efficiently captures protein interactions

Top 100 protein pairs scored by BFG‐Y2H, and their presence in a high‐quality literature‐curated protein interaction set (Lit‐BM‐13), a recent systematic high‐quality human interactome dataset (HI‐II‐14), or the curated BioGRID protein interaction dataset (see Materials and Methods). “Union” represents the union of interacting protein pairs in Lit‐BM‐13, HI‐II‐14, and BioGRID.

Performance in recovering previously reported interactions (“Union”).

Recovery rate by GPCA for BFG‐Y2H‐positive (+) versus BFG‐Y2H‐negative (−) hits and pairwise retest‐positive (+) versus retest‐negative (−) hits.

Distribution of GPCA luciferase intensities (quadruplicates) for protein pairs in the positive control (defined as the overlap of the GPCA‐tested space with the union of the HI‐II‐14 and Lit‐BM‐13 datasets, Rolland et al, 2014; Table EV2), rank 1–55, 56–100, pairwise Y2H retested positives, auto‐activators in the pairwise Y2H pipeline and BFG‐Y2H negative pairs. *P < 0.05, **P < 10−5, and ***P < 10−15 (Mann‐Whitney U‐test).

HAUS1 hits captured by BFG‐Y2H.

Fold enrichment of residue contacts at protein interfaces for different interaction score thresholds. Fold‐change is calculated as the ratio of the average number of residue contacts for the two groups of protein pairs separated by each interaction score threshold. P‐value was calculated using the Mann–Whitney U‐test.

Figure 6
Figure 6. Scalable generation of barcoded bait and prey strains based on a pooled recombinational cloning reaction

Schematic representation of the en masse recombinational cloning process. Randomly barcoded bait or prey destination plasmid pool was combined with a pool of entry ORF plasmids and subjected to a Gateway LR reaction. Randomly barcoded ORF expression clones were isolated by bacterial transformation and colony picking and identified by sequencing.

Generation of BFG‐Y2H‐ready bait and prey haploid pools by en masse transformation of purified barcoded bait and prey expression plasmid pools to the appropriate mating type yeast cells.

Fraction of ORFs assigned to at least n barcodes indicated on the horizontal axis.

Attrition of ORFs and their lengths at steps of the en masse recombinational cloning‐based BFG‐Y2H procedure. **P < 10−4 and ***P < 10−7.

Figure 7
Figure 7. Scalability and performance of BFG‐Y2H

Schematic representation of the increasing size of the four protein pair spaces tested (CENT, CCC, CV, and CVA).

Protein interaction networks identified by each BFG‐Y2H screen. Red lines indicate novel interactions, blue lines indicate previously known interactions (those in the “Union” set) captured by BFG‐Y2H, and gray lines denote known interactions among proteins in the hit list that were not captured by BFG‐Y2H.

Sub‐matrices for the 18 calibration pairs that were commonly tested in all of the four screens. The X and Y ORFs were ordered to present calibration pairs on the diagonal.

Overlap between CV and CVA interactions.

The performance of each BFG‐Y2H screen was measured using Lit‐BM‐13 and compared with that of HI‐II‐14 after restricting both screening spaces to their common ORFs.

Numbers of protein interactions among virhostome proteins (V‐V) and among COSMIC cancer proteins (C‐C) and number of virhostome interactions targeted by same viral proteins. Gray bars demonstrate expectations from the randomly generated networks by a random edge rewiring.

Similar articles

See all similar articles

Cited by 29 articles

See all "Cited by" articles


    1. Andersen JS, Wilkinson CJ, Mayor T, Mortensen P, Nigg EA, Mann M (2003) Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426: 570–574 - PubMed
    1. Angers S, Thorpe CJ, Biechele TL, Goldenberg SJ, Zheng N, MacCoss MJ, Moon RT (2006) The KLHL12‐Cullin‐3 ubiquitin ligase negatively regulates the Wnt‐beta‐catenin pathway by targeting Dishevelled for degradation. Nat Cell Biol 8: 348–357 - PubMed
    1. Belli G, Gari E, Piedrafita L, Aldea M, Herrero E (1998) An activator/repressor dual system allows tight tetracycline‐regulated gene expression in budding yeast. Nucleic Acids Res 26: 942–947 - PMC - PubMed
    1. Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M, Kerkhoven RM, Madiredjo M, Nijkamp W, Weigelt B, Agami R, Ge W, Cavet G, Linsley PS, Beijersbergen RL, Bernards R (2004) A large‐scale RNAi screen in human cells identifies new components of the p53 pathway. Nature 428: 431–437 - PubMed
    1. Berrueta L, Tirnauer JS, Schuyler SC, Pellman D, Bierer BE (1999) The APC‐associated protein EB1 associates with components of the dynactin complex and cytoplasmic dynein intermediate chain. Curr Biol 9: 425–428 - PubMed

Publication types

LinkOut - more resources