Computational framework for identification of intact glycopeptides in complex samples

Anal Chem. 2014 Jan 7;86(1):453-63. doi: 10.1021/ac402338u. Epub 2013 Dec 10.


Glycosylation is an important protein modification that involves enzymatic attachment of sugars to amino acid residues. Understanding the structure of these sugars and the effects of glycosylation are vital for developing indicators of disease development and progression. Although computational methods based on mass spectrometric data have proven to be effective in monitoring changes in the glycome, developing such methods for the glycoproteome are challenging, largely due to the inherent complexity in simultaneously studying glycan structures with their corresponding glycosylation sites. This paper introduces a computational framework for identifying intact N-linked glycopeptides, i.e. glycopeptides with N-linked glycans attached to their glycosylation sites, in complex proteome samples. Scoring algorithms are presented for tandem mass spectra of glycopeptides resulting from collision-induced dissociation (CID), higher-energy C-trap dissociation (HCD), and electron transfer dissociation (ETD) fragmentation modes. An empirical false-discovery rate estimation method, based on a target-decoy search approach, is derived for assigning confidence. The power of our method is further enhanced when multiple data sets are pooled together to increase identification confidence. Using this framework, 103 highly confident N-linked glycopeptides from 53 sites across 33 glycoproteins were identified in complex human serum proteome samples using conventional proteomic platforms with standard depletion of the 7-most abundant proteins. These results indicate that our method is ready to be used for characterizing site-specific protein glycosylation in complex samples.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Cattle
  • Computational Biology / methods*
  • Glycopeptides / analysis
  • Glycopeptides / blood*
  • Glycopeptides / genetics*
  • Humans
  • Molecular Sequence Data
  • Swine


  • Glycopeptides