Evaluating experimental bias and completeness in comparative phosphoproteomics analysis

PLoS One. 2011;6(8):e23276. doi: 10.1371/journal.pone.0023276. Epub 2011 Aug 10.

Abstract

Unraveling the functional dynamics of phosphorylation networks is a crucial step in understanding the way in which biological networks form a living cell. Recently there has been an enormous increase in the number of measured phosphorylation events. Nevertheless, comparative and integrative analysis of phosphoproteomes is confounded by incomplete coverage and biases introduced by different experimental workflows. As a result, we cannot differentiate whether phosphosites indentified in only one or two samples are the result of condition or species specific phosphorylation, or reflect missing data. Here, we evaluate the impact of incomplete phosphoproteomics datasets on comparative analysis, and we present bioinformatics strategies to quantify the impact of different experimental workflows on measured phosphoproteomes. We show that plotting the saturation in observed phosphosites in replicates provides a reproducible picture of the extent of a particular phosphoproteome. Still, we are still far away from a complete picture of the total human phosphoproteome. The impact of different experimental techniques on the similarity between phosphoproteomes can be estimated by comparing datasets from different experimental pipelines to a common reference. Our results show that comparative analysis is most powerful when datasets have been generated using the same experimental workflow. We show this experimentally by measuring the tyrosine phosphoproteome from Caenorhabditis elegans and comparing it to the tyrosine phosphoproteome of HeLa cells, resulting in an overlap of about 4%. This overlap between very different organisms represents a three-fold increase when compared to dataset of older studies, wherein different workflows were used. The strategies we suggest enable an estimation of the impact of differences in experimental workflows on the overlap between datasets. This will allow us to perform comparative analyses not only on datasets specifically generated for this purpose, but also to extract insights through comparative analysis of the ever-increasing wealth of publically available phosphorylation data.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / metabolism
  • Animals
  • Bias
  • Caenorhabditis elegans / metabolism
  • HeLa Cells
  • Humans
  • Mass Spectrometry
  • Phosphoproteins / metabolism*
  • Proteome / metabolism
  • Proteomics / methods*
  • Reproducibility of Results
  • Species Specificity

Substances

  • Amino Acids
  • Phosphoproteins
  • Proteome