Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Jan;1844(1 Pt A):88-97.
doi: 10.1016/j.bbapap.2013.04.004. Epub 2013 Apr 12.

A Tutorial for Software Development in Quantitative Proteomics Using PSI Standard Formats

Affiliations
Free PMC article
Review

A Tutorial for Software Development in Quantitative Proteomics Using PSI Standard Formats

Faviel F Gonzalez-Galarza et al. Biochim Biophys Acta. .
Free PMC article

Abstract

The Human Proteome Organisation - Proteomics Standards Initiative (HUPO-PSI) has been working for ten years on the development of standardised formats that facilitate data sharing and public database deposition. In this article, we review three HUPO-PSI data standards - mzML, mzIdentML and mzQuantML, which can be used to design a complete quantitative analysis pipeline in mass spectrometry (MS)-based proteomics. In this tutorial, we briefly describe the content of each data model, sufficient for bioinformaticians to devise proteomics software. We also provide guidance on the use of recently released application programming interfaces (APIs) developed in Java for each of these standards, which makes it straightforward to read and write files of any size. We have produced a set of example Java classes and a basic graphical user interface to demonstrate how to use the most important parts of the PSI standards, available from http://code.google.com/p/psi-standard-formats-tutorial. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.

Keywords: APIs; Quantitative proteomics; Software; Standard formats.

Figures

Fig. 1
Fig. 1
Prototypical workflow for a label free quantitative analysis, showing which stages are covered by different PSI formats.
Fig. 2
Fig. 2
(A) A portion of an example mzML file showing file-level metadata (lines 1–9), a single spectrum (lines 12 onwards), metadata for the spectrum (lines 13–21), details of a given scan (24–34) and raw m/z data (line 42). (B) Examples of Java code snippets for using jmzML to extract particular details from an mzML raw file. Full source code available at: http://code.google.com/p/psi-standard-formats-tutorial/.
Fig. 3
Fig. 3
Example code showing how an extracted ion chromatogram (XIC) can be generated from an mzML file, using jmzML. The code takes input parameters of an RT and m/z range.
Fig. 4
Fig. 4
(A) Example of mzIdentML capturing PSMs in SpectrumIdentificationItem (SII). SII has references to the Peptide sequence and PeptideEvidence. PeptideEvidence is a one-to-many mapping from a Peptide sequence to proteins, captured in DBSequence. (B) Code snippets using jmzIdentML to retrieve all PSMs from a file. Note: jmzIdentML has a configuration file, allowing references between objects to be switched on or off (auto-resolving). In the example, all object references have been switched off (lowest memory overhead) — requiring the use of internal HashMaps for retrieving objects by their unique ID.
Fig. 5
Fig. 5
(A) An overview of how protein groups are represented in an mzIdentML, showing two proteins in one group, one of which (PDH_8) has been identified by 11 peptides in many spectra (not all shown) and a second protein (PDH_950) has only been identified by one peptide (DETVWEKPLR) in one spectrum only. This peptide is shared with PDH_8, hence the protein is flagged as “sequence sub-set protein” and PDH_8 as the “anchor protein” for the group. A) Snippet of mzIdentML; B) graphical representation of the groups shown.
Fig. 6
Fig. 6
A workflow showing the encoding of quantitative data in mzQuantML for a label-free experiment in which 12 replicates are analysed to produce abundance values at the protein and peptide level (peptide level QuantLayer not shown).

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Jones A.R., Hubbard S.J. An introduction to proteome bioinformatics. Methods Mol. Biol. 2010;604:1–5. - PubMed
    1. Matthiesen R. Methods, algorithms and tools in computational proteomics: a practical point of view. Proteomics. 2007;7:2815–2832. - PubMed
    1. Kall L., Vitek O. Computational mass spectrometry-based proteomics. PLoS Comput. Biol. 2011;7:e1002277. - PMC - PubMed
    1. Cannataro M. Computational proteomics: management and analysis of proteomics data. Brief Bioinform. 2008;9:97–101. - PubMed
    1. Colinge J., Bennett K.L. Introduction to computational proteomics. PLoS Comput. Biol. 2007;3:e114. - PMC - PubMed

Publication types

LinkOut - more resources

Feedback