To address the high false positive rate using >35% identity over 80 amino acids in the regulatory assessment of transgenic proteins for potential allergenicity and the change of E-value with database size, the Needleman-Wunsch global sequence alignment and a one-to-one (1:1) local FASTA search (one protein in the target database at a time) using FASTA were evaluated by comparing proteins randomly selected from Arabidopsis, rice, corn, and soybean with known allergens in a peer-reviewed allergen database (http://www.allergenonline.org/). Compared with the approach of searching >35%/80aa+, the false positive rate measured by specificity rate for identification of true allergens was reduced by a 1:1 global sequence alignment with a cut-off threshold of ≧30% identity and a 1:1 FASTA local alignment with a cut-off E-value of ≦1.0E-09 while maintaining the same sensitivity. Hence, a 1:1 sequence comparison, especially using the FASTA local alignment tool with a biological relevant E-value of 1.0E-09 as a threshold, is recommended for the regulatory assessment of sequence identities between transgenic proteins in food crops and known allergens.
Keywords: Allergen; Bioinformatics; Cross-reactivity; Protein; Transgenic.
Copyright © 2014 Elsevier Ltd. All rights reserved.