Statistics for proteomics: a review of tools for analyzing experimental data

Proteomics. 2006 Sep;6 Suppl 2:48-55. doi: 10.1002/pmic.200600554.


Most proteomics experiments make use of 'high throughput' technologies such as 2-DE, MS or protein arrays to measure simultaneously the expression levels of thousands of proteins. Such experiments yield large, high-dimensional data sets which usually reflect not only the biological but also technical and experimental factors. Statistical tools are essential for evaluating these data and preventing false conclusions. Here, an overview is given of some typical statistical tools for proteomics experiments. In particular, we present methods for data preprocessing (e.g. calibration, missing values estimation and outlier detection), comparison of protein expression in different groups (e.g. detection of differentially expressed proteins or classification of new observations) as well as the detection of dependencies between proteins (e.g. protein clusters or networks). We also discuss questions of sample size planning for some of these methods.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Data Interpretation, Statistical*
  • Electrophoresis, Gel, Two-Dimensional / methods
  • False Negative Reactions
  • False Positive Reactions
  • Gene Expression Profiling / methods
  • Isotope Labeling / methods
  • Mass Spectrometry
  • Mitogen-Activated Protein Kinases / classification
  • Probability
  • Proteins / classification
  • Proteomics / methods*


  • Proteins
  • Mitogen-Activated Protein Kinases