Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

The CRAPome: a contaminant repository for affinity purification-mass spectrometry data

Dattatreya Mellacheruvu et al. Nat Methods. 2013 Aug.

Abstract

Affinity purification coupled with mass spectrometry (AP-MS) is a widely used approach for the identification of protein-protein interactions. However, for any given protein of interest, determining which of the identified polypeptides represent bona fide interactors versus those that are background contaminants (for example, proteins that interact with the solid-phase support, affinity reagent or epitope tag) is a challenging task. The standard approach is to identify nonspecific interactions using one or more negative-control purifications, but many small-scale AP-MS studies do not capture a complete, accurate background protein set when available controls are limited. Fortunately, negative controls are largely bait independent. Hence, aggregating negative controls from multiple AP-MS studies can increase coverage and improve the characterization of background associated with a given experimental protocol. Here we present the contaminant repository for affinity purification (the CRAPome) and describe its use for scoring protein-protein interactions. The repository (currently available for Homo sapiens and Saccharomyces cerevisiae) and computational tools are freely accessible at http://www.crapome.org/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The CRAPome at a glance. (a) Creation of the CRAPome. (1) Contributors to the CRAPome submit raw MS files for negative control runs, detailed experimental protocols and mapping information. (2) Raw MS files are first converted to mzXML and analyzed by X!Tandem and the Trans-Proteomic Pipeline; counts are extracted for protein quantification and the CRAPome administrator performs a quality control check (see Methods). (3) Released high quality runs (data) are associated with experimental descriptions and protocols (metadata) by the CRAPome administrator in consultation with the data provider. (4) Query of the CRAPome database by external users via the web interface. (b) Overview of the first CRAPome workflow. (1) Proteins are queried against the CRAPome by inputting one of several identifiers (Supplementary Note) which enable mapping to Gene ID. Different views enable exploration of the contaminant profile of each queried protein, either as a summary table (2) or in graphical formats (3). (c) Overview of the third CRAPome workflow (note that the second workflow is similar, except that no user data is uploaded; the second workflow generates lists of contaminant proteins). (1) Desired controls are selected, with the help of CVs. (2) Users upload their own data (test experiments and controls if available) to the CRAPome and (3) select parameters for data analysis. Data is displayed in a table format and in different graphical formats, which include the detection of a given interaction in the public repository iRefIndex (4).
Figure 2
Figure 2
Composition of the CRAPome (human data). (a) Relationship between the detection of a given protein in the CRAPome and its protein abundance (all entries are mapped to official gene identification numbers and displayed as corresponding gene symbols). The abundance distribution in HEK293 cells was calculated from shotgun mass spectrometry data (see Methods). The left axis indicates the number of proteins identified at each of the spectral count abundances (green circles; green dashed line shows fit to data); the right axis indicates the fraction of the proteins at a given binned abundance in the CRAPome database (blue triangles). (b) Similarity clusters of all experiments. All experiments in the CRAPome were scored for similarity in their contaminant profiles based on a cosine function: the size of the clusters represents the number of the experiments with strong similarity. Selected similarity clusters are indicated, alongside their composition. (c) Cluster ix, described in b as FLAG agarose in HeLa cells, can be further defined as two sub-clusters based on subcellular fractionation performed prior to the affinity purification (cytosolic and nuclear fractions); other clusters can also be further refined. (d) Example of epitope-tag specificity for selected proteins/genes. (e) Spectral count distribution of the proteins shown in d across the entire dataset. Spectral count bins are shown for all non-zero experiments. The highest spectral count boundary for each bin is shown.
Figure 3
Figure 3
Scoring functions in the CRAPome illustrated on a four bait dataset (MEPCE, EIF4A2, WASL, RAF1; 8 experiments). (a) Comparison between the primary Fold Change score (FC-A) and SAINT for scoring known interactions using negative control runs (n = 6) provided by the user; ROC based on the interactions in iRefIndex. Note that when SAINT scores are identical, ties are broken by the FC-A score. Selected SAINT probability or FC-A score thresholds are represented by triangles and circles, respectively. (b) The relationship between SAINT probability and FC score is well represented by a sigmoid function (dashed curve). (c – d) Histogram visualization of the data presented in (b) can help with data exploration and threshold selection. (e – f) Scoring protein interactions using controls from the CRAPome with SAINT (e) and FC-A (f): User controls (n = 6) are compared to two sets of controls from the CRAPome, selected based on the CVs (Set 1 = 10 controls; Set 2 = 11 controls).
Figure 4
Figure 4
Use of a more stringent Fold Change score (FC-B) to recover true interacting partners for ORC2L. (a) Schematic illustration of the consequences of averaging all spectral counts as opposed to selecting the top three maximal values for scoring protein-protein interactions. Here, protein X represents a contaminant in the purification scheme that is detected with variable counts across the 15 selected controls (the intensity of shading is proportional to the spectral counts). By contrast, protein Y is a contaminant detected with similar counts across all selected controls. The standard primary Fold Change calculation (FC-A) averages the counts across all controls while the more stringent secondary Fold Change score (FC-B) takes the average of the top 3 highest spectral counts for the abundance estimate. The resulting FC-A and FC-B scores are represented schematically where a larger circle indicates a higher fold change, with FC-A and FC-B assigning a similar score to protein Y, but not to protein X. (b) Comparison of SAINT scoring and stringent FC-B with good bait samples. Note here that only the top of the map (the interactions with SAINT probability ≥ 0.9) are displayed. (c) Same as c for bait samples (ORC2L) contaminated with myosin: the more stringent fold change score FC-B helps in discriminating between true interaction partners (labeled “ORC complex”) and contaminants (labeled “myosins”).

Similar articles

Cited by

References

    1. Gingras AC, Gstaiger M, Raught B, Aebersold R. Analysis of protein complexes using mass spectrometry. Nat Rev Mol Cell Biol. 2007;8:645–654. - PubMed
    1. Selbach M, Mann M. Protein interaction screening by quantitative immunoprecipitation combined with knockdown (QUICK) Nat Methods. 2006;3:981–983. - PubMed
    1. Trinkle-Mulcahy L, et al. Identifying specific protein interaction partners using quantitative mass spectrometry and bead proteomes. J Cell Biol. 2008;183:223–239. - PMC - PubMed
    1. Trinkle-Mulcahy L. Resolving protein interactions and complexes by affinity purification followed by label-based quantitative mass spectrometry. Proteomics. 2012;12:1623–1638. - PubMed
    1. Tackett AJ, et al. I-DIRT, a general method for distinguishing between specific and nonspecific protein interactions. J Proteome Res. 2005;4:1752–1756. - PubMed

Publication types