Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul;10(14):2712-8.
doi: 10.1002/pmic.200900473.

Computational Analysis of Unassigned High-Quality MS/MS Spectra in Proteomic Data Sets

Free PMC article

Computational Analysis of Unassigned High-Quality MS/MS Spectra in Proteomic Data Sets

Kang Ning et al. Proteomics. .
Free PMC article


In a typical shotgun proteomics experiment, a significant number of high-quality MS/MS spectra remain "unassigned." The main focus of this work is to improve our understanding of various sources of unassigned high-quality spectra. To achieve this, we designed an iterative computational approach for more efficient interrogation of MS/MS data. The method involves multiple stages of database searching with different search parameters, spectral library searching, blind searching for modified peptides, and genomic database searching. The method is applied to a large publicly available shotgun proteomic data set.


Figure 1
Figure 1. Overview of the iterative peptide identification strategy
Proteins are digested into peptides, and peptides are sequenced using MS/MS. Acquired spectra are analyzed using conventional database searching. Peptide identifications are processed using PeptideProphet and ProteinProphet. A spectral quality assessment tool is used to select unassigned high quality spectra. These spectra are reanalyzed using X! TANDEM and InsPecT (normal and blind mode) against the subset protein database, and using SpectraST spectral library search tool. The remaining unassigned spectra are searched against the translated genomic database to identify novel peptides and peptide polymorphisms.
Figure 2
Figure 2. Prevalence and categories of unassigned high quality spectra
(a) The distribution of spectral quality scores plotted for all spectra (solid line), and separately for unassigned (dash dot line) and assigned (short dash) spectra after the initial database search. (b) The ratio of spectra assigned to peptides of different types (“percent total” refers to the proportion of spectra assigned to peptides of different type among the total number of initially unassigned spectra) during reanalysis, plotted as a function of the spectral quality score. The category ‘tryptic, subset db’ refers to spectra corresponding to unmodified tryptic peptides that were identified due to reduced search space. The category ‘tryptic, spectral lib’ refers to spectra corresponding to unmodified tryptic peptides identified using spectral library searching, and includes some spectra that were also identified by other methods. WCL fraction data.
Figure 3
Figure 3. Additional analysis of peptide categories
(a) The ratio of proteins (among proteins of similar abundance as measured using spectral counts) containing at least one modified peptide of a particular type (WCL fraction data). Shown are methionine oxidation (+16), N-terminal acetylation/carbamylation (+42), and pyroglutamic acid formation from N terminal glutamic acid (−17.0) (b) Most frequent modifications and their normalized frequencies in WCL, plasma membrane (PM), and raft fractions. (c) Novel peptides (according to NCBI NR database) identified by the genomic database search and categorized by edit distance (WCL, plasma membrane, raft fractions).

Similar articles

See all similar articles

Cited by 19 articles

See all "Cited by" articles

Publication types