Large-scale characterization of intact N-glycopeptides using an automated glycoproteomic method

J Proteomics. 2014 Oct 14;110:145-54. doi: 10.1016/j.jprot.2014.08.006. Epub 2014 Aug 23.


The detailed characterization of site-specific glycosylation requires the identification of glycan composition and specific attachment sites on proteins, which need the identification of intact glycopeptides by mass spectrometry. In this study, we present an analytical and computational strategy for the high throughput characterization of intact N-glycopeptides derived from complex proteome samples. N-glycopeptides were identified using the spectra acquired for intact glycopeptides as well as de-glycopeptides. The Y1 ion (peptide+GlcNAc) was accurately determined from the spectra of intact glycopeptides, and the structure of glycan was then identified by searching a constructed glycan database with calculated molecular weight of glycans and their fragment ions. The peptide sequences of intact glycopeptides were identified by matching the molecular weight calculated from Y1 ion to that of deglycosylated peptides from the same HILIC enrichment and identified by a separated LC-MS/MS analysis. The fully automated software platform integrates all of the above processes involved in the identification of the intact N-glycopeptides. This platform was applied to detailed characterization of site-specific glycosylation in HEK 293T cells, which led to the identification of 2249 unique intact N-glycopeptides. These intact glycopeptides revealed 1769 site-specific N-glycans on 453 glycosylation sites which demonstrated the high heterogeneity of glycosylations.

Biological significance: We presented a fully automated software platform for the high throughput characterization of intact N-glycopeptides derived from complex proteome samples. Intact glycopeptides and their deglycosylated forms were identified respectively and combined according to the commonality of molecular weights of peptide backbones. The strong correlation of retention times effectively filtered out random matches. The reliability of this strategy was carefully evaluated which showed a probability of random matches less than 1%. In total, 2249 intact glycopeptides were identified which is by the far the largest dataset among the studies of N-glycoproteomics.

Keywords: Glycan structure; Mass spectrometry; N-glycoproteomics; Software platform.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Carbohydrate Sequence
  • Chromatography, Liquid / methods
  • Computer Simulation
  • Glycomics / methods*
  • Glycopeptides / chemistry*
  • High-Throughput Screening Assays / methods*
  • Mass Spectrometry / methods
  • Models, Chemical
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods
  • Polysaccharides / chemistry*
  • Proteomics / methods*
  • Sequence Analysis, Protein / methods*


  • Glycopeptides
  • Polysaccharides