Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 1;16(12):4403-4414.
doi: 10.1021/acs.jproteome.7b00423. Epub 2017 Oct 31.

Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy


Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy

Amr Elguoshy et al. J Proteome Res. .


In an attempt to complete human proteome project (HPP), Chromosome-Centric Human Proteome Project (C-HPP) launched the journey of missing protein (MP) investigation in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1) are still considered as missing and uncertain proteins, respectively. Thus, in this study, we proposed a pipeline to analyze, identify, and validate human missing and uncertain proteins in open-access transcriptomics and proteomics databases. Analysis of RNA expression pattern for missing proteins in Human protein Atlas showed that 28% of them, such as Olfactory receptor 1I1 ( O60431 ), had no RNA expression, suggesting the necessity to consider uncommon tissues for transcriptomic and proteomic studies. Interestingly, 21% had elevated expression level in a particular tissue (tissue-enriched proteins), indicating the importance of targeting such proteins in their elevated tissues. Additionally, the analysis of RNA expression level for missing proteins showed that 95% had no or low expression level (0-10 transcripts per million), indicating that low abundance is one of the major obstacles facing the detection of missing proteins. Moreover, missing proteins are predicted to generate fewer predicted unique tryptic peptides than the identified proteins. Searching for these predicted unique tryptic peptides that correspond to missing and uncertain proteins in the experimental peptide list of open-access MS-based databases (PA, GPM) resulted in the detection of 402 missing and 19 uncertain proteins with at least two unique peptides (≥9 aa) at <(5 × 10-4)% FDR. Finally, matching the native spectra for the experimentally detected peptides with their SRMAtlas synthetic counterparts at three transition sources (QQQ, QTOF, QTRAP) gave us an opportunity to validate 41 missing proteins by ≥2 proteotypic peptides.

Keywords: GPM; HPP; PA; SRMAtlas; missing proteins; uncertain proteins.

Similar articles

See all similar articles

Cited by 3 articles

  • Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0.
    Deutsch EW, Lane L, Overall CM, Bandeira N, Baker MS, Pineau C, Moritz RL, Corrales F, Orchard S, Van Eyk JE, Paik YK, Weintraub ST, Vandenbrouck Y, Omenn GS. Deutsch EW, et al. J Proteome Res. 2019 Dec 6;18(12):4108-4116. doi: 10.1021/acs.jproteome.9b00542. Epub 2019 Oct 21. J Proteome Res. 2019. PMID: 31599596 Free PMC article.
  • Expanding the Use of Spectral Libraries in Proteomics.
    Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JPC, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Deutsch EW, et al. J Proteome Res. 2018 Dec 7;17(12):4051-4060. doi: 10.1021/acs.jproteome.8b00485. Epub 2018 Oct 11. J Proteome Res. 2018. PMID: 30270626 Free PMC article. Review.
  • Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project.
    Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, Van Eyk JE, Liu S, Snyder M, Baker MS, Deutsch EW. Omenn GS, et al. J Proteome Res. 2018 Dec 7;17(12):4031-4041. doi: 10.1021/acs.jproteome.8b00441. Epub 2018 Aug 23. J Proteome Res. 2018. PMID: 30099871 Free PMC article. Review.

Publication types

LinkOut - more resources