A comprehensive full factorial LC-MS/MS proteomics benchmark data set

Proteomics. 2012 Aug;12(14):2276-81. doi: 10.1002/pmic.201100284.


An important prerequisite for the development and benchmarking of novel analysis methods is a well-designed comprehensive LC-MS/MS data set. Here, we present our data set consisting of 59 LC-MS/MS analyses of 50 protein samples extracted individually from Escherichia coli K12 and spiked with different concentrations of bovine carbonic anhydrase II and/or chicken ovalbumin, according to a 2 × 3 full factorial design. Using the well-annotated and commonly used E. coli proteome as the sample background ensures that the complexity of the data is on a par with most current proteomic analyses. Data were acquired over a 2-month period using multiple reversed-phase columns and instrument calibrations to include real-life challenges faced when analyzing large proteomics data sets. Moreover, so-called "ground truth" data, comprised by LC-MS/MS measurements of the pure spikes are included in the data set. The current manuscript elaborates this comprehensive benchmark data set for future development and evaluation of analysis methods and software.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Carbonic Anhydrase II / chemistry
  • Cattle
  • Chickens
  • Chromatography, Liquid / methods*
  • Databases, Protein*
  • Escherichia coli Proteins / chemistry
  • Ovalbumin / chemistry
  • Peptide Fragments / chemistry
  • Proteome / chemistry*
  • Proteomics / methods*
  • Tandem Mass Spectrometry / methods*


  • Escherichia coli Proteins
  • Peptide Fragments
  • Proteome
  • Ovalbumin
  • Carbonic Anhydrase II