Remarkable progress continues on the annotation of the proteins identified in the Human Proteome and on finding credible proteomic evidence for the expression of "missing proteins". Missing proteins are those with no previous protein-level evidence or insufficient evidence to make a confident identification upon reanalysis in PeptideAtlas and curation in neXtProt. Enhanced with several major new data sets published in 2014, the human proteome presented as neXtProt, version 2014-09-19, has 16,491 unique confident proteins (PE level 1), up from 13,664 at 2012-12 and 15,646 at 2013-09. That leaves 2948 missing proteins from genes classified having protein existence level PE 2, 3, or 4, as well as 616 dubious proteins at PE 5. Here, we document the progress of the HPP and discuss the importance of assessing the quality of evidence, confirming automated findings and considering alternative protein matches for spectra and peptides. We provide guidelines for proteomics investigators to apply in reporting newly identified proteins.
Keywords: Global Proteome Machine database (GPMDB); HPP metrics; Human Protein Atlas; Human Proteome Project; PeptideAtlas; guidelines; high-confidence protein identifications; missing proteins; neXtProt; novel proteins.