Computational detection of allergenic proteins attains a new level of accuracy with in silico variable-length peptide extraction and machine learning

Nucleic Acids Res. 2006;34(13):3779-93. doi: 10.1093/nar/gkl467. Epub 2006 Aug 23.

Abstract

The placing of novel or new-in-the-context proteins on the market, appearing in genetically modified foods, certain bio-pharmaceuticals and some household products leads to human exposure to proteins that may elicit allergic responses. Accurate methods to detect allergens are therefore necessary to ensure consumer/patient safety. We demonstrate that it is possible to reach a new level of accuracy in computational detection of allergenic proteins by presenting a novel detector, Detection based on Filtered Length-adjusted Allergen Peptides (DFLAP). The DFLAP algorithm extracts variable length allergen sequence fragments and employs modern machine learning techniques in the form of a support vector machine. In particular, this new detector shows hitherto unmatched specificity when challenged to the Swiss-Prot repository without appreciable loss of sensitivity. DFLAP is also the first reported detector that successfully discriminates between allergens and non-allergens occurring in protein families known to hold both categories. Allergenicity assessment for specific protein sequences of interest using DFLAP is possible via ulfh@slv.se.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Allergens / analysis*
  • Allergens / chemistry
  • Artificial Intelligence*
  • Computational Biology / methods*
  • Databases, Protein
  • Humans
  • Peptides / chemistry
  • Peptides / immunology
  • Peptides / isolation & purification
  • Proteins / chemistry
  • Proteins / immunology*
  • Reproducibility of Results
  • Sequence Analysis, Protein / methods*
  • Tropomyosin / chemistry
  • Tropomyosin / immunology

Substances

  • Allergens
  • Peptides
  • Proteins
  • Tropomyosin