A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods

J Proteomics. 2017 Jun 23:163:67-75. doi: 10.1016/j.jprot.2017.03.026. Epub 2017 Apr 4.

Abstract

The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten.

Significance: We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass spectrometry methods in line with Codex Alimentarius Commission requirements for foods designed to meet the needs of gluten intolerant individuals.

Keywords: Celiac toxic motif; Database; Gluten; Proteomics; Triticum aestivum.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Celiac Disease / etiology
  • Databases, Protein* / standards
  • Databases, Protein* / trends
  • Diet, Gluten-Free
  • Gliadin / analysis
  • Glutens / analysis*
  • Humans
  • Proteomics / methods*

Substances

  • Glutens
  • Gliadin
  • glutenin