Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 8 (10), 2535-44

The Spectral Networks Paradigm in High Throughput Mass Spectrometry

Affiliations
Review

The Spectral Networks Paradigm in High Throughput Mass Spectrometry

Adrian Guthals et al. Mol Biosyst.

Abstract

High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.

Figures

Fig. 1
Fig. 1
Discovery and identification of post-translational modifications through spectral networks; (a) Spectral alignment between modified and unmodified variants of the peptide TETMA (b-ions shown in blue, y-ions in red, blue/red lines track consecutively matched b/y-ions); (b) Grouped modification states of the peptide MDVTIQHPWFK from a sample of cataractous lenses. Nodes in the spectral network represent individual MS2 spectra and edges between nodes represent significant spectral alignments such as that shown in part (a); (c) Spectra assembled in the spectral network for TNSMVTLGCLVK with diverse Cysteine modifications on a monoclonal antibody. Each arrow corresponds to the propagation of a sequence and/or PTM from an identified spectrum to an unidentified spectrum (repeated arrows are iterative propagations). Arrow colors correspond to types of modifications transferred.
Fig. 2
Fig. 2
Shotgun Protein Sequencing (SPS) via assembly of tandem mass spectra; (a) Spectral alignment between spectra for peptide WSCILMEPKR (purple), PEWSCILMEPKR (green), WSCILMEPK (red), WSCILMoxEPK (cyan); Mox represents oxidized Methionine. Matching peaks in spectral alignments become pairwise gluing instructions between every pair of aligned spectra. (b) Protein contig resulting from 24 spectra from a monoclonal antibody (aBTLA heavy chain). Each spectrum is shown superimposed with a sequence of arrows indicating its sequence of recovered masses; modified variants of the consensus sequence are indicated by red arrows (6 different modifications on 7 spectra). (c) The complete aBTLA heavy chain sequence recovered by Comparative SPS; highlighted sections were covered by protein contigs (95% coverage) and the missing amino acids were obtained from homologous protein sequences.
Fig. 3
Fig. 3
Analysis of the cyclic peptide Seglitide. (a) The circular structure of Seglitide is schematically illustrated with each residue represented by a different color (slice sizes not scaled to corresponding masses of the residues). A+14 denotes a non-standard residue with integer mass 71 + 14 = 85 Da. (b) MS2 fragmentation of Seglitide generates up to 6 linear peptides representing different rotated variants of the same cyclic peptide. (c) Theoretical spectrum for Seglitide by superposition of the fragment masses of the linearized peptides. (d) Experimental spectrum of Seglitide resulting from a mixture of 6 linear peptides (the peaks corresponding to fragment ions are shown in red). (e) Spectral network from assembled Seglitide MSn spectra and used for de novo sequencing with unknown amino acid masses.
Fig. 4
Fig. 4
Molecular spectral network of a partial Bacillus subtilis secretome; nodes indicate MS2 spectra of initially-unknown compounds of any class of molecules (no peptide-specific assumptions were made), and edges indicate significant similarity between the MS2 fragmentation patterns of different spectra, mostly between intermediates/variants of the same compounds. Selected molecular structures are shown in black overlaid with the network and next to the correspondingly highlighted network clusters.

Similar articles

See all similar articles

Cited by 30 articles

See all "Cited by" articles

Publication types

Feedback