Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 1;16(12):4374-4390.
doi: 10.1021/acs.jproteome.7b00388. Epub 2017 Oct 11.

Enhanced Missing Proteins Detection in NCI60 Cell Lines Using an Integrative Search Engine Approach

Affiliations
Free PMC article

Enhanced Missing Proteins Detection in NCI60 Cell Lines Using an Integrative Search Engine Approach

Elizabeth Guruceaga et al. J Proteome Res. .
Free PMC article

Abstract

The Human Proteome Project (HPP) aims deciphering the complete map of the human proteome. In the past few years, significant efforts of the HPP teams have been dedicated to the experimental detection of the missing proteins, which lack reliable mass spectrometry evidence of their existence. In this endeavor, an in depth analysis of shotgun experiments might represent a valuable resource to select a biological matrix in design validation experiments. In this work, we used all the proteomic experiments from the NCI60 cell lines and applied an integrative approach based on the results obtained from Comet, Mascot, OMSSA, and X!Tandem. This workflow benefits from the complementarity of these search engines to increase the proteome coverage. Five missing proteins C-HPP guidelines compliant were identified, although further validation is needed. Moreover, 165 missing proteins were detected with only one unique peptide, and their functional analysis supported their participation in cellular pathways as was also proposed in other studies. Finally, we performed a combined analysis of the gene expression levels and the proteomic identifications from the common cell lines between the NCI60 and the CCLE project to suggest alternatives for further validation of missing protein observations.

Keywords: C-HPP; CCLE; NCI60; integration of search engines; missing proteins; peptide detectability.

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
(A) Overall scheme of the analysis pipeline developed to identify missing proteins. An integrative strategy based on the results of four search engines was used with the shotgun experiments of the NCI60 data set and the RNA-Seq experiments of the CCLE project.
Figure 2
Figure 2
Number of total PSMs obtained from the analysis of the NCI60 proteomic data set summarizing the results per search engine used and tissue of origin of the cell lines.
Figure 3
Figure 3
(A) Number of unique peptides detected with any of the four search engines. (B) Number of proteins detected following the C-HPP guidelines. For each cell line and experiment type (deep proteome or proteome profile), all the results obtained with the four search engines are represented.
Figure 4
Figure 4
(A) Number of unique peptides detected across chromosomes considering all the experiments analyzed and obtained for each of the four search engines used in the study. (B) Number of proteins identified using the C-HPP guidelines across chromosomes obtained for each of the four search engines used in the study. (C) Venn diagram representation of the unique peptides found per search engine considering all the experiments analyzed. (D) Venn diagram representation of the proteins found per search engine (C-HPP guidelines).
Figure 5
Figure 5
(A) Number of unique peptides associated with missing proteins per chromosome and search engine. (B) Number of missing proteins identified with at least one unique peptide. Highlighted (in black) proteins were identified with two unique peptides, following the C-HPP guidelines.
Figure 6
Figure 6
(A) Number of unique peptides associated with missing proteins separated per search engine. (B) Number of missing proteins identified with one (left) and two (right) unique peptides.
Figure 7
Figure 7
Interaction network of the detected missing proteins with the best score in IPA.
Figure 8
Figure 8
Transcript expression level distributions of protein coding, noncoding, and novel gene categories were compared in each of the 43 cell lines of the CCLE initiative.
Figure 9
Figure 9
(A) Expression level distribution of all the genes structures in the 43 cell lines analyzed is shown and both quartiles Q1 and Q3 are marked in red. (B) Number of genes expressed or highly expressed for each cell line and proteins identified in the corresponding proteomic experiments for the same cell lines are represented (Number of MiTranscritome accessions are shown for transcriptomics and number of neXtProt accessions for proteomics). (C) Venn diagram with the intersections between expressed genes, highly expressed genes, and detected proteins in the set of 43 cell lines. (D) Number of missing proteins detected in each cell line and how many of their corresponding genes are expressed or highly expressed in the same cell lines. (E) Venn diagram with the intersections between expressed genes, highly expressed genes, and identified missing proteins in the set of 43 cell lines.
Figure 10
Figure 10
Performance evaluation of the peptide detectability classifiers is shown using ROC analysis with the test data set.
Figure 11
Figure 11
Percentage of predicted peptide detectability for distinct sets of peptides: nondetected peptides of the identified missing proteins, detected peptides of identified missing proteins, detected peptides of nonmissing identified proteins, detected peptides of the proteins identified by the four search engines used (Common proteins), and detected peptides of the proteins identified by only one of the search engines (Comet, Mascot, OMSSA, and X!Tandem specific proteins). In red, predicted to be detectable peptides and in blue, peptides predicted to be not detectable.

Similar articles

See all similar articles

References

    1. Legrain P.; et al. The human proteome project: Current state and future direction. Mol. Cell. Proteomics 2011, 7, M111–009993. 10.1074/mcp.O111.009993. - DOI - PMC - PubMed
    1. Paik Y.-K.; et al. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012, 30, 221–223. 10.1038/nbt.2152. - DOI - PubMed
    1. Paik Y.-K.; et al. Standard guidelines for the chromosome-centric human proteome project. J. Proteome Res. 2012, 11, 2005–2013. 10.1021/pr200824a. - DOI - PubMed
    1. Aebersold R.; Bader G. D.; Edwards A. M.; van Eyk J. E.; Kussmann M.; Qin J.; Omenn G. S. The Biology/Disease-driven Human Proteome Project (B/D-HPP): Enabling Protein Research for the Life Sciences Community. J. Proteome Res. 2013, 12, 23–27. 10.1021/pr301151m. - DOI - PubMed
    1. Aebersold R.; Bader G. D.; Edwards A. M.; van Eyk J. E.; Kussman M.; Qin J.; Omenn G. S. Highlights of B/D-HPP and HPP Resource Pillar Workshops at 12th Annual HUPO World Congress of Proteomicsf. Proteomics 2014, 14, 975–988. 10.1002/pmic.201400041. - DOI - PubMed

Publication types

Feedback