Further evaluation of the utility of "sliding window" FASTA in predicting cross-reactivity with allergenic proteins

Regul Toxicol Pharmacol. 2009 Aug;54(3 Suppl):S20-5. doi: 10.1016/j.yrtph.2008.11.006. Epub 2008 Dec 11.


FAO/WHO has recommended that IgE cross-reactivity between a transgenic protein and allergen be considered when there is greater than 35% identity over a sliding "window" of 80 amino acids. In a previous work, we evaluated the false positive and negative rates observed using the FAO/WHO criteria versus conventional, whole protein FASTA analyses [Ladics, G.S., Bannon, G.A., Silvanovich, A., Cressman, R.F., 2007. Comparison of conventional FASTA identity searches with the 80 amino acid sliding window FASTA search for the elucidation of potential identities to known allergens. Mol. Nutr. Food Res. 51 (8), 985-998]. A number of protein sequence datasets were used as queries against the FARRP 7 allergen database. Results indicated that conventional FASTA analysis produced fewer false positives then the "sliding window" search proposed by FAO/WHO. Further, both methods were able to identify the potential for cross-reactivity between the Bet v 1 family of proteins, indicating that the conventional FASTA search possessed sufficient sensitivity. Recently, collections of protein sequences from multiple crop species (corn, soy, barley, lettuce, sugar beets, and spinach) were subjected to the same screen against the FARRP7 allergen dataset. In all cases, the conventional FASTA search returned fewer above threshold matches than the sliding window search. Examination of the matches not recognized by the conventional search revealed two scenarios: (1) "true" false positives consisting of low statistical significance (as measured by E score, i.e., a measure of the potential random occurrence of aligned sequences used to evaluate the significance of an observed alignment) alignments not contained in the conventional FASTA outputs, and (2) above-threshold sliding window alignments that fell below the 35% identity threshold with the conventional FASTA analysis. Although some alignments within this second group were between regions of low sequence complexity, where there was little/no change in E score, the majority of the alignments displayed more significance (lower E scores) under the conventional FASTA algorithm, yet did not meet the threshold defined by FAO/WHO. These data question the utility of the FAO/WHO recommended sliding window FASTA compared to the traditional whole sequence FASTA analysis coupled with appropriate statistical analysis.

MeSH terms

  • Algorithms
  • Allergens / immunology*
  • Computational Biology
  • Crops, Agricultural / immunology
  • Cross Reactions
  • Databases, Protein
  • Plant Proteins / immunology*


  • Allergens
  • Plant Proteins