State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet

J Proteome Res. 2015 Sep 4;14(9):3461-73. doi: 10.1021/acs.jproteome.5b00500. Epub 2015 Jul 24.

Abstract

The Human PeptideAtlas is a compendium of the highest quality peptide identifications from over 1000 shotgun mass spectrometry proteomics experiments collected from many different laboratories, all reanalyzed through a uniform processing pipeline. The latest 2015-03 build contains substantially more input data than past releases, is mapped to a recent version of our merged reference proteome, and uses improved informatics processing and the development of the AtlasProphet to provide the highest quality results. Within the set of ∼20,000 neXtProt primary entries, 14,070 (70%) are confidently detected in the latest build, 5% are ambiguous, 9% are redundant, leaving the total percentage of proteins for which there are no mapping detections at just 16% (3166), all derived from over 133 million peptide-spectrum matches identifying more than 1 million distinct peptides using AtlasProphet to characterize and classify the protein matches. Improved handling for detection and presentation of single amino-acid variants (SAAVs) reveals the detection of 5326 uniquely mapping SAAVs across 2794 proteins. With such a large amount of data, the control of false positives is a challenge. We present the methodology and results for maintaining rigorous quality along with a discussion of the implications of the remaining sources of errors in the build.

Keywords: Human Proteome Project; PeptideAtlas; observed proteome; repositories; shotgun proteomics; tandem mass spectrometry.

Publication types

  • Research Support, American Recovery and Reinvestment Act
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution
  • Databases, Protein*
  • Humans
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Proteomics*
  • Sequence Homology, Amino Acid

Substances

  • Proteins