Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;5(9):e1000515.
doi: 10.1371/journal.pcbi.1000515. Epub 2009 Sep 25.

A Novel Scoring Approach for Protein Co-Purification Data Reveals High Interaction Specificity

Affiliations
Free PMC article

A Novel Scoring Approach for Protein Co-Purification Data Reveals High Interaction Specificity

Xueping Yu et al. PLoS Comput Biol. .
Free PMC article

Abstract

Large-scale protein interaction networks (PINs) have typically been discerned using affinity purification followed by mass spectrometry (AP/MS) and yeast two-hybrid (Y2H) techniques. It is generally recognized that Y2H screens detect direct binary interactions while the AP/MS method captures co-complex associations; however, the latter technique is known to yield prevalent false positives arising from a number of effects, including abundance. We describe a novel approach to compute the propensity for two proteins to co-purify in an AP/MS data set, thereby allowing us to assess the detected level of interaction specificity by analyzing the corresponding distribution of interaction scores. We find that two recent AP/MS data sets of yeast contain enrichments of specific, or high-scoring, associations as compared to commensurate random profiles, and that curated, direct physical interactions in two prominent data bases have consistently high scores. Our scored interaction data sets are generally more comprehensive than those of previous studies when compared against four diverse, high-quality reference sets. Furthermore, we find that our scored data sets are more enriched with curated, direct physical associations than Y2H sets. A high-confidence protein interaction network (PIN) derived from the AP/MS data is revealed to be highly modular, and we show that this topology is not the result of misrepresenting indirect associations as direct interactions. In fact, we propose that the modularity in Y2H data sets may be underrepresented, as they contain indirect associations that are significantly enriched with false negatives. The AP/MS PIN is also found to contain significant assortative mixing; however, in line with a previous study we confirm that Y2H interaction data show weak disassortativeness, thus revealing more clearly the distinctive natures of the interaction detection methods. We expect that our scored yeast data sets are ideal for further biological discovery and that our scoring system will prove useful for other AP/MS data sets.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Co-occurrence significance (CS) scores measure the interaction specificity for two proteins in AP/MS data.
(A) Flow chart for the computation of CS scores. (B) Illustration for protein pair Tub1∶Tub2 showing an overrepresented co-occurrence in the purification data of Gavin et al. : 156 (observed) vs. 61.2 (σ = 6.0) (random), with corresponding CS score of 15.8. (C) Illustration for protein pair Ssa1∶Ssa2, showing an underrepresented co-occurrence in the purification data of Gavin et al. : 65 (observed) vs. 187.6 (σ = 6.6) (random), with corresponding CS score of −18.7. (D) Total score distributions of experimental data sets and corresponding average distributions from 105 random (shuffled) realizations (see Materials and Methods). (E) score distributions, in purification data of Gavin et al. , of selected curated interactions in MIPS and SGD-Biogrid (SBMC2) , repositories (see Materials and Methods) showing their measured high specificities.
Figure 2
Figure 2. Evaluation of the IDBOS scoring scheme.
Coverage versus accuracy data (see Materials and Methods) comparing the scoring schemes of IDBOS (this work) and Collins et al. , when applied to the purification data of Gavin et al. . Four diverse reference interaction data sets were used: (A) BGS; (B) PCA; (C) SBMC2; and (D) MIPS. See Materials and Methods for full descriptions of these references. Also shown is the scored data of Hart et al. (determined by multiplying individual results across the Gavin et al. , Krogan et al. , and Ho et al. AP/MS data sets) and evaluations for Y2H data sets of Yu et al. (CCSB-YI1), Ito et al. (core subset), Uetz et al. , and a union of these data sets (Y2H-union).
Figure 3
Figure 3. Abundance effects in high-confidence PINs derived from AP/MS data.
The association between protein degree and abundance in high-confidence PINs derived by (A) the IDBOS procedure (this work) and (B) Collins et al. , from AP/MS data sets of Gavin et al. and Krogan et al. . Proteins were sorted by increasing abundance, as measured by Newman et al. , into 11 classes. Undetectable low-abundant proteins comprised class 0 while the remaining proteins were sorted into 10 equally-sized classes. The sizes of classes 0/classes 1–10 were as follows: 231/92 for the IDBOS-Gavin PIN; 265/68 for the IDBOS-Krogan (MALDI) PIN; 424/101 for the IDBOS-Krogan (LCMS) PIN; 238/87 for the Collins-Gavin PIN; and 384/111 for the Collins-Krogan (MALDI+LCMS) PIN. For each class, we determined the significance of the average degree, as a Z-score, compared to the network average and standard deviation determined from equivalently-sized randomly-compiled pools (104 realizations). The enclosed rectangular areas represent |Z|<2.6 (P>0.05 after multiple-test correction).
Figure 4
Figure 4. The high-confidence IDBOS-Gavin PIN is highly modular.
Depictions of (A) the high-confidence IDBOS-Gavin PIN and (B) a commensurate, degree-preserving random network. (C) Enrichments of numbers of disjoined parts in the IDBOS-Gavin PIN and Y2H data sets of Yu et al. (CCSB-YI1), Ito et al. (core subset), Uetz et al. , and a union of these data sets (Y2H-union). Expected values and standard deviations (SD) were computed from 1000 realizations of commensurate, degree-preserving random networks. (D) Clustering coefficients of the IDBOS-Gavin PIN and experimental Y2H data sets. The inset shows average clustering coefficients by degree for the IDBOS-Gavin PIN and two realizations of a commensurate, degree-preserving random network. (E) Coverage versus accuracy data for the weakest links in the IDBOS-Gavin PIN using the BGS reference set (see Materials and Methods). Also shown are coverage-accuracy values for the Y2H data sets.
Figure 5
Figure 5. Indirect associations in the IDBOS-Gavin PIN and Y2H data sets are enriched with false negatives.
An indirect association occurs when two non-interacting proteins share an interaction partner, e.g., A and B represent an indirect association in the case of A–C–B. Indirect associations form a subset of all non-interactions. A false negative is defined as a non-interaction that is curated as a direct physical interaction in a reference set: (A) BGS, (B) SBMC2 (see Materials and Methods). The fraction of indirect associations that are false negatives (actual) was compared with the fraction of all non-interactions that are false negatives (expected). Enrichments were computed as ratios of actual/expected.
Figure 6
Figure 6. High-confidence AP/MS interaction data shows assortative mixing while Y2H interaction data shows disassortative mixing.
(A) Power-law-like degree distribution of the IDBOS-Gavin PIN and for a commensurate completely random Erdös-Rényi (ER) graph. Enrichments (Z-scores) of interaction frequencies, relative to commensurate, degree-preserving random networks (104 realizations) between pairs of degrees in the (B) IDBOS-Gavin PIN, (C) Y2H-union data set , and (D) BGS curated interaction set (see Materials and Methods). Most red indicates Z≥5 (overrepresented) and most green indicates Z≤−5 (underrepresented).

Similar articles

See all similar articles

Cited by 16 articles

See all "Cited by" articles

References

    1. Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol. 2007;3:88. - PMC - PubMed
    1. Wu X, Jiang R, Zhang MQ, Li S. Network-based global inference of human disease genes. Mol Syst Biol. 2008;4:189. - PMC - PubMed
    1. Chuang HY, Lee E, Liu YT, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007;3:140. - PMC - PubMed
    1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A. 2001;98:4569–4574. - PMC - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. - PubMed

Publication types

MeSH terms

LinkOut - more resources

Feedback