Data management and data integration in the HUPO plasma proteome project

Methods Mol Biol. 2011;696:247-57. doi: 10.1007/978-1-60761-987-1_15.

Abstract

The Human Plasma Proteome Project (HPPP) is an international collaboration coordinated by the Human Proteome Organisation (HUPO). Its Pilot Phase generated the 2005 Proteomics special issue "Exploring the Human Plasma Proteome" (Omenn et al. Proteomics 5:3226-3245, 2005) and a book with the same title (Omenn GS (ed) (2006) Exploring the human plasma proteome. Wiley-Liss, Weinheim, pp 372). Data management for that Pilot Phase included collection, integration, analysis, and dissemination of findings from participating laboratories and data repositories. Many investigators face the same challenges of integration of data from complex, dynamic serum, and plasma specimens. The PPP workflow assembled a representative Core Dataset of 3,020 protein identifications, overcoming ambiguity and redundancy in the heterogeneous contributed identifications and redundancy and updates in the protein sequence databases. The results were made available with alternative thresholds from the University of Michigan, yielding a range of numbers of protein identifications. Data were submitted to EBI/PRIDE and to ISB/PeptideAtlas. The current phase of the PPP employs Proteome Xchange to link submission of well-annotated primary datasets to EBI/PRIDE, distributed file sharing by Tranche/Proteome Commons.org, and reanalysis from the primary raw spectra at ISB/PeptideAtlas. Such human plasma proteome datasets are available for data mining comparisons with the proteomes of other organs and biofluids in health and disease.

MeSH terms

  • Algorithms
  • Blood Proteins / analysis*
  • Cooperative Behavior
  • Database Management Systems*
  • Databases, Protein*
  • Humans
  • Immunoassay
  • Mass Spectrometry
  • Peptides / analysis
  • Proteome / analysis*
  • Proteomics
  • Software

Substances

  • Blood Proteins
  • Peptides
  • Proteome