Proteogenomic Analysis to Identify Missing Proteins from Haploid Cell Lines

Proteomics. 2018 Apr;18(8):e1700386. doi: 10.1002/pmic.201700386.


Chromosome-centric Human Proteome Project aims at identifying and characterizing protein products encoded from all human protein-coding genes. As of early 2017, 19 837 protein-coding genes have been annotated in the neXtProt database including 2691 missing proteins that have never been identified by mass spectrometry. Missing proteins may be low abundant in many cell types or expressed only in a few cell types in human body such as sperms in testis. In this study, we performed expression proteomics of two near-haploid cell types such as HAP1 and KBM-7 to hunt for missing proteins. Proteomes from the two haploid cell lines were analyzed on an LTQ Orbitrap Velos, producing a total of 200 raw mass spectrometry files. After applying 1% false discovery rates at both levels of peptide-spectrum matches and proteins, more than 10 000 proteins were identified from HAP1 and KBM-7, resulting in the identification of nine missing proteins. Next, unmatched spectra were searched against protein databases translated in three frames from noncoding RNAs derived from RNA-Seq data, resulting in six novel protein-coding regions after careful manual inspection. This study demonstrates that expression proteomics coupled to proteogenomic analysis can be employed to identify many annotated and unannotated missing proteins.

Keywords: Haploid cell lines; Proteogenomics; RNA-Seq; lncRNA; missing protein.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Cell Line
  • Haploidy*
  • Humans
  • Proteogenomics / methods*
  • Proteome / analysis
  • Proteome / genetics*
  • RNA, Untranslated / genetics
  • Sequence Analysis, RNA / methods
  • Tandem Mass Spectrometry / methods
  • Transcriptome*


  • Proteome
  • RNA, Untranslated