NetCGlyc 1.0: prediction of mammalian C-mannosylation sites

Glycobiology. 2007 Aug;17(8):868-76. doi: 10.1093/glycob/cwm050. Epub 2007 May 9.

Abstract

C-mannosylation is the attachment of an alpha-mannopyranose to a tryptophan via a C-C linkage. The sequence WXXW, in which the first Trp becomes mannosylated, has been suggested as a consensus motif for the modification, but only two-thirds of known sites follow this rule. We have gathered a data set of 69 experimentally verified C-mannosylation sites from the literature. We analyzed these for sequence context and found that apart from Trp in position +3, Cys is accepted in the same position. We also find a clear preference in position +1, where a small and/or polar residue (Ser, Ala, Gly, and Thr) is preferred and a Phe or a Leu residue discriminated against. The Protein Data Bank was searched for structural information, and five structures of C-mannosylated proteins were obtained. We showed that modified tryptophan residues are at least partly solvent exposed. A method predicting the location of C-mannosylation sites in proteins was developed using a neural network approach. The best overall network used a 21-residue sequence input window and information on the presence/absence of the WXXW motif. NetCGlyc 1.0 correctly predicts 93% of both positive and negative C-mannosylation sites. This is a significant improvement over the WXXW consensus motif itself, which only identifies 67% of positive sites. NetCGlyc 1.0 is available at http://www.cbs.dtu.dk/services/NetCGlyc/. Using NetCGlyc 1.0, we scanned the human genome and found 2573 exported or transmembrane transcripts with at least one predicted C-mannosylation site.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Amino Acid Sequence
  • Animals
  • Databases, Protein
  • Genome, Human
  • Glycoproteins / chemistry*
  • Glycoproteins / metabolism
  • Humans
  • Mammals
  • Mannose / analysis*
  • Mannose / metabolism
  • Models, Molecular
  • Molecular Sequence Data
  • Neural Networks, Computer*

Substances

  • Glycoproteins
  • Mannose