Background: Influenza A viruses generate an extreme genetic diversity through point mutation and gene segment exchange, resulting in many new strains that emerge from the animal reservoirs, among which was the recent highly pathogenic H5N1 virus. This genetic diversity also endows these viruses with a dynamic adaptability to their habitats, one result being the rapid selection of genomic variants that resist the immune responses of infected hosts. With the possibility of an influenza A pandemic, a critical need is a vaccine that will recognize and protect against any influenza A pathogen. One feasible approach is a vaccine containing conserved immunogenic protein sequences that represent the genotypic diversity of all current and future avian and human influenza viruses as an alternative to current vaccines that address only the known circulating virus strains.
Methodology/principal findings: Methodologies for large-scale analysis of the evolutionary variability of the influenza A virus proteins recorded in public databases were developed and used to elucidate the amino acid sequence diversity and conservation of 36,343 sequences of the 11 viral proteins of the recorded virus isolates of the past 30 years. Technologies were also applied to identify the conserved amino acid sequences from isolates of the past decade, and to evaluate the predicted human lymphocyte antigen (HLA) supertype-restricted class I and II T-cell epitopes of the conserved sequences. Fifty-five (55) sequences of 9 or more amino acids of the polymerases (PB2, PB1, and PA), nucleoprotein (NP), and matrix 1 (M1) proteins were completely conserved in at least 80%, many in 95 to 100%, of the avian and human influenza A virus isolates despite the marked evolutionary variability of the viruses. Almost all (50) of these conserved sequences contained putative supertype HLA class I or class II epitopes as predicted by 4 peptide-HLA binding algorithms. Additionally, data of the Immune Epitope Database (IEDB) include 29 experimentally identified HLA class I and II T-cell epitopes present in 14 of the conserved sequences.
Conclusions/significance: This study of all reported influenza A virus protein sequences, avian and human, has identified 55 highly conserved sequences, most of which are predicted to have immune relevance as T-cell epitopes. This is a necessary first step in the design and analysis of a polyepitope, pan-influenza A vaccine. In addition to the application described herein, these technologies can be applied to other pathogens and to other therapeutic modalities designed to attack DNA, RNA, or protein sequences critical to pathogen function.