Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 3 (4), e75

Coping With Viral Diversity in HIV Vaccine Design


Coping With Viral Diversity in HIV Vaccine Design

David C Nickle et al. PLoS Comput Biol.


The ability of human immunodeficiency virus type 1 (HIV-1) to develop high levels of genetic diversity, and thereby acquire mutations to escape immune pressures, contributes to the difficulties in producing a vaccine. Possibly no single HIV-1 sequence can induce sufficiently broad immunity to protect against a wide variety of infectious strains, or block mutational escape pathways available to the virus after infection. The authors describe the generation of HIV-1 immunogens that minimizes the phylogenetic distance of viral strains throughout the known viral population (the center of tree [COT]) and then extend the COT immunogen by addition of a composite sequence that includes high-frequency variable sites preserved in their native contexts. The resulting COT(+) antigens compress the variation found in many independent HIV-1 isolates into lengths suitable for vaccine immunogens. It is possible to capture 62% of the variation found in the Nef protein and 82% of the variation in the Gag protein into immunogens of three gene lengths. The authors put forward immunogen designs that maximize representation of the diverse antigenic features present in a spectrum of HIV-1 strains. These immunogens should elicit immune responses against high-frequency viral strains as well as against most mutant forms of the virus.

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.


Figure 1
Figure 1. 9mer Peptide Distribution Derived from 169 HIV-1 Subtype B Gag and Nef Protein Sequences
Each bin in the histogram represents the number of 9mers from a particular frequency class plotted on a log scale. There are only a few peptides found at high frequencies, whereas most of the 9mers occur only once or twice. The score of a given frame is the sum of the frequencies of each unique 9mer contained by the frame. The possible extreme value frequencies for each peptide from all rare to all common is 1.198 × 10−5 − 0.0020 for Gag (black bars) and 2.988 × 10−5 − 0.0051 for Nef (gray bars). The differences in the two distributions can be explained by the differences in gene length and levels of conservation.
Figure 2
Figure 2. The Effects of Stride versus Window Length on the Measure of Coverage
In each graph a three-gene-length COT+ construct is evaluated for coverage. Cold (blue) colors indicate high levels of coverage, and hot (red) colors indicate low levels of coverage. The diagonal in the topography represents the transition from strides shorter than window length to strides longer than window length. The maximal coverage at three gene lengths occurs with a window size of 17 with a stride of 1 with no smoothing for both genes—where 82% of the 9mer area is captured for Gag (A) and 62% of the 9mer area is captured for Nef (B). It should be noted that in the area of window of 17 and a stride of 1 the surface is quite flat, and there are several pairs of parameters that give similar results.
Figure 3
Figure 3. Coverage Comparison between COT+ and 100 Randomly Sampled (without Replacement) Sets of Sequences of the Same Length
The comparison at the single gene length is for HIV-1 subtype B Gag and Nef, and measures the COT sequence against randomly sampled database sequences. The COT+ captures all known variation in the training dataset at 33 gene lengths for Gag (A) and 67 gene lengths for Nef (B). Neither Gag nor Nef randomly sampled datasets will reach 100% coverage until 100% of the data is sampled.
Figure 4
Figure 4. The Distribution of the Number of Known Epitopes in Three Randomly Chosen Gag (Right-Side Distribution) and Nef (Left-Side Distribution) Genes from the Los Alamos National Laboratory Database
The COT+ sequence at three gene lengths for Gag has 98 out of 102 known CTL epitopes, and Nef has 40 out of 49 known CTL epitopes.
Figure 5
Figure 5. Possible Configurations for Vaccine Constructs
Each bar represents one unit-length gene. The fill intensity of each bar represents the density of unique peptides and known CTL epitopes. The coverage that each construct captures of the amino acid diversity of the dataset is shown on the right for both 9mers and epitopes. (A) COT+, composed of the estimated COT plus the appended high-frequency peptides (HFPs) composing the second and third gene lengths. (B) COT plus HFPs placed into a gene collinear fashion on the second and third gene lengths. (C) COT plus two NSs chosen to maximize 9mer coverage. (D) All NSs of Gag and Nef sequences chosen such that 9mer coverage is maximized and for comparative reasons (E) is average coverage across all NSs. The GenBank IDs of the NSs are written inside each bar.

Comment in

Similar articles

See all similar articles

Cited by 57 articles

See all "Cited by" articles


    1. Mullins JI, Nickle DC, Heath L, Rodrigo AG, Learn GH. Immunogen sequence: The fourth tier of AIDS vaccine design. Expert Rev Vaccines. 2004;3(Supplement 1):S151–S159. - PubMed
    1. Gao F, Korber BT, Weaver E, Liao HX, Hahn BH, et al. Centralized immunogens as a vaccine strategy to overcome HIV-1 diversity. Expert Rev Vaccines. 2004;3:S161–S168. - PubMed
    1. Palker TJ, Matthews TJ, Langlois AJ, Tanner ME, Martin ME, et al. Polyvalent human immunodeficiency virus synthetic immunogen comprised of envelope gp120 T-helper cell sites and B-cell neutralization epitopes. J Immunol. 1989;142:3612–3619. - PubMed
    1. De Groot AS, Marcon L, Bishop EA, Rivera D, Kutzler M, et al. HIV vaccine development by computer assisted design: The GAIA vaccine. Vaccine. 2005;23:2136–2148. - PubMed
    1. Thomson SA, Jaramillo AB, Shoobridge M, Dunstan KJ, Everett B, et al. Development of a synthetic consensus sequence scrambled antigen HIV-1 vaccine designed for global use. Vaccine. 2005;23:4647–4657. - PubMed

Publication types

MeSH terms


LinkOut - more resources