Identification of missing proteins in the neXtProt database and unregistered phosphopeptides in the PhosphoSitePlus database as part of the Chromosome-centric Human Proteome Project

J Proteome Res. 2013 Jun 7;12(6):2414-21. doi: 10.1021/pr300825v. Epub 2013 Jan 11.


The Chromosome-Centric Human Proteome Project (C-HPP) is an international effort for creating an annotated proteomic catalog for each chromosome. The first step of the C-HPP project is to find evidence of expression of all proteins encoded on each chromosome. C-HPP also prioritizes particular protein subsets, such as those with post-translational modifications (PTMs) and those found in low abundance. As participants in C-HPP, we integrated proteomic and phosphoproteomic analysis results from chromosome-independent biomarker discovery research to create a chromosome-based list of proteins and phosphorylation sites. Data were integrated from five independent colorectal cancer (CRC) samples (three types of clinical tissue and two types of cell lines) and lead to the identification of 11,278 proteins, including 8,305 phosphoproteins and 28,205 phosphorylation sites; all of these were categorized on a chromosome-by-chromosome basis. In total, 3,033 "missing proteins", i.e., proteins that currently lack evidence by mass spectrometry, in the neXtProt database and 12,852 unknown phosphorylation sites not registered in the PhosphoSitePlus database were identified. Our in-depth phosphoproteomic study represents a significant contribution to C-HPP. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD000089.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Cell Line, Tumor
  • Chromosomes, Human / metabolism*
  • Colorectal Neoplasms / chemistry*
  • Colorectal Neoplasms / genetics
  • Colorectal Neoplasms / metabolism
  • Databases, Protein*
  • Gene Expression
  • Gene Expression Profiling
  • Genome, Human
  • Human Genome Project*
  • Humans
  • Mass Spectrometry
  • Molecular Sequence Data
  • Neoplasm Proteins / genetics
  • Neoplasm Proteins / isolation & purification*
  • Neoplasm Proteins / metabolism
  • Phosphopeptides / isolation & purification*
  • Phosphopeptides / metabolism
  • Phosphorylation
  • Proteome / genetics
  • Proteome / isolation & purification*
  • Proteome / metabolism


  • Neoplasm Proteins
  • Phosphopeptides
  • Proteome