A proteomics sample metadata representation for multiomics integration and big data analysis

Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3.


The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Big Data
  • Data Analysis*
  • Databases, Protein*
  • Humans
  • Metadata*
  • Proteomics*
  • Reproducibility of Results
  • Software
  • Transcriptome